Principal Engineer, Platform Reliability Engineering

Arcesium

Location

Hyderabad | India

Job description

We are looking for an experienced

Principal Engineer

to implement a new monitoring tool for the firm. The ideal candidate will have a strong background in SRE principles and practices, and strong knowledge and experience in maintaining monitoring frameworks for large scale organizations. The Engineer will be responsible for the evaluation of monitoring tools, understand the scale of Arcesium, and propose a cost effective and reliable monitoring framework, also manage the system end to end. The SRE team is responsible for monitoring the stability and availability of mission critical production systems, managing incidents for quicker resolution, and establishing BAU. Team also building tools/infra which to be used by all development teams to assist in monitoring and troubleshooting. This position is for

HYD/BLR .

What You'll Do Design, develop, and implement scalable and reliable monitoring solutions for distributed systems at scale. Define and implement monitoring requirements in collaboration with cross-functional teams. Lead the development of monitoring architectures and strategies. Integrate monitoring tools into existing infrastructure. Maintain and support monitoring systems. Demonstrate strong technical breadth/depth, driving innovation, evaluating new technologies, and deciphering the technical vision for engineering teams. Own key contributions to technical design and architecture decisions, considering trade-offs of choices, managing risk, making decisions independently where appropriate, and presenting reasoned options for decision making by others. Lead the way by writing exemplary code, documentation, and RFCs. Identify, propose, develop, deploy, and own R&D projects in accordance with the technical vision and needs of the team, turning problem statements into solutions, and operating independently as needed.

What You'll Need 10+ years of experience in SRE or a related field. Proven experience in designing, developing, and implementing monitoring solutions. Deep understanding of monitoring technologies and tools, including Prometheus, Grafana, Loki, and Tempo Experience with cloud-based monitoring systems, such as New Relic, Datadog, and Grafana Cloud Experience with log analysis tools, such as Splunk, Logstash, Fluent, and Sumo Logic Experience with distributed tracing implementation using Open Telemetry, Jaeger Strong understanding of SRE principles and practices. Experience with incident response and management. Reliability: An exposure to Chaos Engineering and various reliability practices including disaster recovery will be good to have. Experience with Cloud Computing like AWS. Experience with Kubernetes. Experience in Agile practices (Scrum) Excellent analytical, problem-solving, and troubleshooting skills. Excellent communication and presentation skills. Experience managing and mentoring engineers. Ability to work independently and as part of a team.

The Company offers excellent benefits, an informal and collegial working environment, and an attractive compensation package.

Members of the Arcesium Company Group do not discriminate in employment matters on the basis of sex, race, color, caste, creed, religion, pregnancy, national origin, age, military service eligibility, veteran status, sexual orientation, marital status, disability, or any other protected class.

Job tags

Salary

Principal Engineer, Platform Reliability Engineering

GENERAL

Home

About

Contact

Blog

MORE PAGES

Popular searches

Urban popular searches

Cities

Companies

LEGAL

Privacy policy

Terms of service

eAccessibility commitment

JobNob HQ Address

1 E Broad St
Ste 130 - 1252
Bethlehem, PA 18018-5934
United States

Principal Engineer, Platform Reliability Engineering

GENERAL

Home

About

Contact

Blog

MORE PAGES

Popular searches

Urban popular searches

Cities

Companies

LEGAL

Privacy policy

Terms of service

eAccessibility commitment

JobNob HQ Address

1 E Broad St Ste 130 - 1252 Bethlehem, PA 18018-5934 United States

1 E Broad St
Ste 130 - 1252
Bethlehem, PA 18018-5934
United States