Production Engineer - Site Reliability Engineer (CLOUD AWS)
Location
Mumbai | India
Job description
Our goal at Pivotree is to help accelerate the future of frictionless commerce. We will help lead this change over the next decade because we believe a future where technology is embedded intimately into all aspects of our everyday lives can benefit everyone and will shape the interactions with the brands we love. We will help shape the future of frictionless commerce by working together with some of the best brands in the world and some of the best people in the industry to leverage converging technologies that will make it possible to accelerate frictionless commerce faster than ever.
This is a journey of technology acceleration combined with consumer readiness and adoption. We are looking for people capable of adapting relentlessly to the rapidly evolving world around us.
Position Summary We are currently seeking a Site Reliability Engineer to join our team.In this role you will contribute to the reliability and enhancement of the technology engine that powers multiple Pivotree solutions. The primary function of this role is the direct responsibility for the availability of platform solutions, focusing on several key areas, including availability, performance, change management, monitoring and emergency response. You will work with other members of the platform, solutions, operations, and application teams to understand and ultimately address changing and evolving requirements through extending and exposing capabilities in a simple and consistent fashion. You will be a member of a team who maintains expertise with Utility Computing services and will advise management and the organization as a whole on this mode of computing.
You will - Contribute to ensuring pooled and independent utility services are highly available
- Actively take part and initiate continuous improvement: measure and reduce manual tasks and overhead
- Be a subject matter expert for Utility Computing providers and respective services both existing and emerging - with particular focus on AWS
- Complete systems development, administration, and engineering tasks including integration, documentation and testing
- Develop and maintain tools, processes, and workflows for automated infrastructure resource(s) and application deployment, configuration management & maintenance
- Own the responsibility for platform management, supporting services, and all related tooling and automation
- Investigate and troubleshoot relevant platform-based issues and incidents, (high availability, performance, security, etc.)
- Participate in recurring stand-ups with other team members located in different locations and time zones
- Participate in on-call rotation, escalations, and shift work (generally Monday to Friday, Wednesday to Sunday)
- Work with other team members to improve processes and advance relevant and related competencies
You are - Super comfortable with Linux (RHEL-based / Debian-based)
- Experienced with supporting software development teams and workflows
- A team player, one that recognizes the power of Agile and team based delivery
- Well versed in infrastructure & application monitoring, logging, and tracing
- Able to effectively decompose problems into workable chunks
- Experienced at working on large projects with deadlines
- Committed to high quality and attention to detail
- Focused and committed to delivering high quality services
- A strategic thinker who is able to link business and technical objectives
- Someone that can go wide and deep, who work with several disparate systems and services and ultimately acquires expert knowledge and who can navigate accordingly
You have (MUST HAVE)- Minimum one Associate-level Amazon AWS certification, or will achieve this wthin 3 months
- A mature understanding of and lots of experience with infrastructure-as-code concepts and practices
- 1+ years - working with tools to support version control, build automation and automated testing (e.g. the usual suspects... Git, Jenkins, TravisCI, Selenium, etc.)
- 1+ years - production experience operating container and container orchestration technologies (ideally Docker and Kubernetes / managed Kubernetes service)
- 2+ years - infrastructure lifecycle management with tooling such as AWS CloudFormation, HashiCorp Terraform, or similar
- 2+ years - monitoring system performance
- 2+ years - implementing and maintaining security and compliance for all aspects of system and components where possible
- 2+ years - implementation and operating experience in respectable scale API-driven production environments on AWS
- 3+ years - system administration experience (OS, network, storage, virtualization management, etc.) in challenging production environments and have associated war stories
- 3+ years - Debian-based and RHEL-based Linux
- 3+ years - web service, application, middleware, and database support
- Exceptional communication skills and are able to convey decisions and ideas in a clear and concise manner
- The ability to work independently as well as collaboratively
- The ability to learn and adapt to new and overlapping technologies quickly and independently, and to formulate and implement standards, procedures and best practices
- The ability to think in systems
- Experience with the likes of Python, Bash, or similar to extend and increase efficiencies
NICE TO HAVE - Experience and/or exposure to the Serverless Framework
- Experience with APM tools such as AppDynamics, NewRelic or Dynatrace, Amazon X-Ray
- Experience with the following Amazon AWS services in a production environment (API Gateway, Cognito, DynamoDB, ECS, EMR, Lambda)
- AWS Certified Developer
- AWS Certified SysOps Administrator
Pivotree is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive and accessible workplace
Apply Now
Job tags
Salary