logo

JobNob

Your Career. Our Passion.

Principal Site Reliability Engineering Manager


Wipro


Location

Delhi | India


Job description

Principal Site Reliability EngineerWe are seeking a highly skilled and experienced Principal Site Reliability Engineer (SRE) to join Lab45 team in Wipro. As a Principal SRE, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. Your expertise and leadership will be essential in driving the adoption of best practices, designing scalable architectures, and improving the overall reliability of our infrastructure.Responsibilities:Design, implement, and maintain highly available and scalable systems, services, and architectures to support our organization's applications and infrastructure.Lead efforts to improve system reliability, monitoring, and performance, utilizing automation and best practices for continuous integration and deployment.Collaborate with cross-functional teams to identify and resolve performance bottlenecks, scalability issues, and architectural challenges.Develop and implement incident response procedures, conduct post-incident analysis, and drive root cause analysis to prevent future incidents.Define and enforce service-level objectives (SLOs) and service-level agreements (SLAs) to ensure the reliability and availability of our systems and applications.Automate deployment and configuration processes, utilizing infrastructure-as-code and configuration management tools.Stay up-to-date with the latest industry trends and technologies related to site reliability engineering, and proactively recommend and implement improvements.Mentor and provide technical leadership to SRE and engineering teams, promoting a culture of reliability, performance, and scalability.Requirements:Bachelor's or Master's degree in Computer Science, Engineering, or a related field.Extensive experience (10+ years) in site reliability engineering or a similar role, with a strong focus on designing and maintaining scalable, reliable, and high-performance systems. Deep understanding of cloud technologies and platforms (e.G., AWS, Azure, Google Cloud) and experience with cloud-based infrastructure management.Proficiency in scripting and programming languages (e.G., Python, Go, Java) for automation and infrastructure-as-code.Strong knowledge of containerization and orchestration technologies (e.G., Docker, Kubernetes) and experience with microservices architectures.Experience with monitoring and observability tools (e.G., Prometheus, Grafana, ELK stack) to ensure system health and performance.Familiarity with incident management and response processes, including on-call rotations and post-incident analysis.Excellent troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex system issues.Strong communication and collaboration skills, with the ability to effectively interact with technical and non-technical stakeholders.Relevant certifications (e.G., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer) are highly desirable.Join our team as a Principal Site Reliability Engineer and contribute to the design, implementation, and maintenance of our organization's reliable and scalable systems. Apply your expertise to drive site reliability initiatives, mentor engineering teams, and ensure the highest level of system performance, availability, and resilience.


Job tags



Salary

All rights reserved