logo

JobNob

Your Career. Our Passion.

Associate Site Reliability Engineer (SRE)


Turbolab


Location

Ernakulam (district) | India


Job description

As an Associate Site Reliability Engineer (SRE), you will play a critical role in ensuring our servers and network infrastructure's high availability, reliability, and performance. You will lead efforts in infrastructure automation, performance optimization, incident management, monitoring and alerting, documentation and knowledge sharing, disaster recovery, server scaling, tickets and on-call support, asset management, policy creation, and account setup. Your focus will be on implementing best practices, reducing manual intervention, increasing deployment efficiency, minimising service disruptions, and fostering a culture of shared learning.

Key Responsibilities :

Tickets & On-Call Support :

Provide timely resolution of tickets and on-call incidents, meeting SLAs for response and resolution times.

Asset Management :

Maintain accurate records of infrastructure assets and configurations, ensuring compliance with asset management policies.

Server Reliability & Network Maintenance :

Maintain high availability and reliability of servers and network infrastructure to meet or exceed uptime SLAs.

Infrastructure Automation :

Implement and enhance infrastructure automation tools and processes to achieve a reduction in manual intervention and increase deployment efficiency.

Performance Optimization :

Improve system performance and scalability to ensure optimal user experience, targeting a decrease in response time and an increase in throughput.

Incident Management :

Lead and execute incident response efforts to minimize service disruptions.

Monitoring and Alerting :

Enhance monitoring and alerting systems to provide proactive detection of issues, aiming for a reduction in false positives and an improvement in response time to critical alerts.

Documentation and Knowledge Sharing :

Develop and maintain comprehensive documentation and knowledge-sharing platforms to foster a culture of shared learning, with a goal of increasing documentation coverage annually.

Disaster Recovery :

Establish and validate disaster recovery plans and procedures, ensuring a recovery time objective.

Server Scaling :

Scale servers to dynamically adjust server capacity based on demand, aiming to achieve a reduction in over-provisioning and under-provisioning incidents.

Policy Creation :

Develop and implement policies and procedures to govern system reliability, security, and compliance, ensuring alignment with industry best practices and regulatory requirements.

Account Setup :

Streamline account provisioning processes to accelerate the onboarding of new users.

Qualifications :


Job tags



Salary

All rights reserved