logo

JobNob

Your Career. Our Passion.

Site Reliability Engineer


Lemma Technologies


Location

Pune | India


Job description

In this role, you will play a crucial part in designing, implementing, and maintaining our infrastructure, ensuring the reliability, scalability, and performance of our systems

You will collaborate closely with development teams, leveraging your expertise to enhance our CI/CD processes, automate deployments, and optimize the overall software development lifecycle

Skills Required:

Proven experience in DevOps and Site Reliability Engineering roles, demonstrating a deep understanding of infrastructure automation, continuous integration, and deployment methodologies.

Proficiency in containerization technologies (Docker, Kubernetes) and orchestration tools.

Solid programming/scripting skills (Python, Bash, Ruby, etc.) for automation and tooling development.

Familiarity with configuration management tools (Ansible, Puppet, Chef) and version control systems (Git).

Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, etc.).

Strong problem-solving skills and the ability to troubleshoot complex system issues efficiently.

Excellent communication and collaboration skills, with the ability to work effectively in cross- functional teams.

Roles & Responsibilities:

- Design, implement, and manage the companys DevOps and SRE strategies to ensure the high availability, performance, and scalability of our applications.
- Collaborate with development teams to establish and improve CI/CD pipelines, automating deployment processes, and facilitating rapid and reliable software releases.
- Monitor and manage cloud and on-premises infrastructure, proactively identifying and resolving performance bottlenecks, security vulnerabilities, and system outages.
- Implement efficient monitoring, logging, and alerting systems to provide real-time insights into the health and performance of our applications.
- Drive continuous improvement through automation, scripting, and process optimization,
aiming to reduce manual interventions and enhance overall system efficiency.
- Participate in incident response and root cause analysis, developing strategies to prevent
future incidents and ensuring system reliability.
- Stay up-to-date with industry best practices, emerging technologies, and trends in DevOps and Site Reliability Engineering.


Job tags



Salary

All rights reserved