NTT Data
Location
Pune | India
Job description
Req ID: 263510
NTT DATA Services strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.
We are currently seeking a Industry Consulting Director to join our team in Pune, Mahrshtra (IN-MH), India (IN).
You Will
Explore new tools and monitoring to be implemented for proactive availability management.
Analyse the trade-offs of the proposed design and make recommendations based on these trade-offs.
Mentor new engineers to achieve more than they thought possible. You enjoy making other teams successful.
Work on reliability projects, including HA, Business Continuity Planning, disaster recovery and Chaos engineering.
Application uptime and performance
SLIs, SLOs, and crafting monitoring dashboards
Responsible for deployment and operations of large-scale distributed data stores and streaming services
Establishing design patterns for monitoring and benchmarking
Establishing and documenting production run books and guidelines for developers
Tooling & automation to handle production environments.
Incident management and improving MTTD for services.
Qualifications
Must-Have
Bachelors/Masters in Computer Science, MCA, or related technical field, or equivalent practical experience.
Total 12+ years of work experience out of which 5+ years of SRE experience in architecting large-scale observability platforms.
Handson experience with Design, implement, maintain, and deploy observability-related systems covering application & infrastructure telemetry and logs, metric visualizations and alerting management.
Experience with infrastructure automation and scripting using Python and/or bash scripting.
Clear understanding of observability Metrics, Log Analytics and Traces.
Well versed with ITSM processes with ITIL Certification.
Strong hands-on experience in monitoring tools such as Azure Monitor & Insights, Cloud Watch, Prometheus, Jaeger, Grafana etc. to build observability for large-scale microservices deployments.
Clear understanding of Open Telemetry & API standards
Good understanding and working experience with any one API Gateways. Kong would be an added advantage.
Expertise in integrations with ServiceNow - should have experience in integrating SNOW (HANA with Snow etc. for making ticket etc.)
Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems.
Good knowledge of event management and clustering algorithms
Working exposure with security and compliance teams to ensure data security and adherence to regulatory requirements.
Preferred
ServiceNow Event Management Module is a big advantage.
AWS Solutions Architect or Azure certification
Experience with CI/CD frameworks & Pipeline-as-Code such as Jenkins/ Gitlab/ Artifactory etc.
Working knowledge of AWS EKS or a general Kubernetes platform in a production environment.
Proven skills to effectively work across teams and functions to influence the design, operations, and deployment of highly available software.
Knowledge of Azure AD/IAM is preferred.
Job tags
Salary