Senior Systems And Infrastructure Engineer

Walmart

Location

Bangalore | India

Job description

Demonstrates up-to-date expertise and applies this to the development, execution, and improvement of action plans by providing expert advice and guidance to others in the application of information and best practices; supporting and aligning efforts to meet customer and business needs; and building commitment for perspectives and rationales
Provides and supports the implementation of business solutions by building relationships and partnerships with key stakeholders; identifying business needs; determining and carrying out necessary processes and practices; monitoring progress and results; recognizing and capitalizing on improvement opportunities; and adapting to competing demands, organizational changes, and new responsibilities
Models compliance with company policies and procedures and supports company mission, values, and standards of ethics and integrity by incorporating these into the development and implementation of business plans; using the Open Door Policy; and demonstrating and assisting others with how to apply these in executing business processes and practices

What youll do:

As a Senior Site Reliability Operations Engineer within the Global Technology Platforms (GTP) CCC team you will work with other CCC, TDO, SRE, DevOps and Engineering practitioners to pro-actively maintain mission-critical infrastructure, cloud platforms, micro-services, tools, and processes that will ensure highest levels of availability and reliability across our Global Technology platforms
Youre right for the job if you are comfortable leading our major incident response as part of a technical team of engineer s laser focused on restoring service across complex distributed systems
Youll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization
You will work directly with our SRE, Engineering and DevOps teams to support our next generation always up cloud-based e-commerce platforms
The CCC Senior Site Reliability Operations Engineer is responsible for pro-actively monitoring, detecting and resolving site issues before they become customer and availability impacting
Technically you will understand the full end to end stack and use this knowledge to detect errors/failures and take corrective action to mitigate
During a major incident, you will draw on your technical skills and knowledge to triage and troubleshoot, differentiating between symptom and cause, to help restore impacting issues
Your ability to continuously challenge yourself and develop a strong network within your peer group will see you exceed in this role
Our goal is to protect the customer experience and deliver outstanding levels of availability

To do so, you will need strong skills in the following areas:

Xmatters workflow integration with scalability, resiliency and performance
Expert level understanding of incident management processes and procedures.
Calm under pressure when participating in major incident response.
Deep technical understanding of core infrastructure, cloud services, platforms and micro-services.
Ability to understand and capture key data from logs at an expert level.
Ability to understand traffics flows and key dependencies between services.
Ability to effectively triage - be able to detect and determine symptom vs cause.
Detect and quantify impact.
Expert level troubleshooting skills using a diverse set of tools and methods
Analyze trends to pro-actively prevent incidents.
Focus on immediate restoration vs root cause.
Research and recommend alternative actions for incident resolution - Develop procedures and documentation to support this.
Create and maintain procedural documentation.
Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline).
Absorb knowledge and understand complex distributed systems - ability to share and impart this knowledge into your peer group and beyond.
Build tools to improve visibility, pro-actively detect issues and restore system availability.
Develop automation and self-healing with DevOps, Engineering and SRE partners.
Strong focus on collecting and inferring metrics.
Clear communication skills.
Ability to contribute to multiple incidents at any given time.
Analyze systems and make recommendations to prevent possible problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.
Scripting and software development to automate and help enhance existing solutions.
Experience owning, developing and evangelizing a product.
Ability to gather requirements and build solutions into a product.
Evangelize operational excellence

Additional responsibilities may include:

Actively provide data for and participate in root cause analysis.
Define CCC onboarding process and ensure they are adhered to when accepting new systems into service.
Share knowledge globally between CCC teams.
Analyze systems and make recommendations to prevent possible incidents.
Strive for continuous improvement and make recommendations based on CCC process.
Act as a technical focal point for the CCC team.
Other duties and responsibilities as assigned.

Qualifications:

7+ years experience in enterprise application development and API integrations with Java, React/Java Script.
Experience building and scaling distributed, highly available systems
Experience developing applications for a cloud environment such as Google Cloud Platform or Microsoft Azure
Experience with frameworks/tools such as GIT, xMatters workflow integration, Service Now Integration etc
Comfortable building metrics, monitoring, and alerting for micro-services
4+ years in an infrastructure, systems, engineering or development environment delivering operational excellence to highly complex distributed systems.

Job tags

Salary

Senior Systems And Infrastructure Engineer

GENERAL

Home

About

Contact

Blog

MORE PAGES

Popular searches

Urban popular searches

Cities

Companies

LEGAL

Privacy policy

Terms of service

eAccessibility commitment

JobNob HQ Address

1 E Broad St
Ste 130 - 1252
Bethlehem, PA 18018-5934
United States

Senior Systems And Infrastructure Engineer

GENERAL

Home

About

Contact

Blog

MORE PAGES

Popular searches

Urban popular searches

Cities

Companies

LEGAL

Privacy policy

Terms of service

eAccessibility commitment

JobNob HQ Address

1 E Broad St Ste 130 - 1252 Bethlehem, PA 18018-5934 United States

1 E Broad St
Ste 130 - 1252
Bethlehem, PA 18018-5934
United States