Be responsible over the readiness and stability of our systems
Create and maintain cloud-based infrastructure in AWS
Support and troubleshoot product and infrastructure issues in production environments
Develop and provide documentation on procedures
Perform system upgrades following new releases
Identify automation opportunities to improve operations
Provide on-call support and monitor the production environment on a rotational basis
Keep up with the latest technology and tools
Responsibilities
Perform root cause analysis for production errors
Build tools to reduce occurrences of errors and improve customer experience
Investigate and resolve technical issues
Design procedures for system troubleshooting and maintenance
Implement adjustments or new infrastructure requirements
Guide users on processes and tools
Requirements
Minimum 3 year experience as DevOps engineer, SRE, or similar roles
Production experience in Kubernetes, AWS ECS, Containerized applications
Have designed infrastructure with IaC tools (Terraform, CloudFormation)
Experience using CI/CD concepts and creating workflow using GitHub Actions
Experience with cloud-based services and infrastructure, including the knowledge of networking (AWS)
Experience in Python, and Nodejs
Experience in Data engineering or setting up an airflow is a plus
Ability to work odd hours in order to maintain our system’s health
Strong communication skills in order to collaborate effectively with engineers and other cross-functional stakeholders with varying technical backgrounds and priorities
Work experience as a DevOps Engineer or Site Reliability Engineer