Site Reliability Engineer
Location
Bloomsbury, Greater London | United Kingdom
Job description
- determine the reliability of our digital products, technology services, and the infrastructure that underpins them
- minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
- respond to production incidents to gain first-hand experience of operational hotspots and to identify the root causes of problems
- collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
- apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to automated testing, deployment, and operations
- ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements
Your expertise
- ideally 5+ years of experience in an Application Support role within financial services industry
- excellent verbal and written communication skills along with strong collaboration skills
- experience in some scripting languages
- experience in Application Performance Monitoring (APM) Tools
- knowledge of Linux OS
- knowledge in IP networking
- knowledge of troubleshooting Java applications
- knowledge of application and web servers (NGINX, Apache)
- knowledge of visualization (Docker, K8S)
- knowledge of provisioning cloud infrastructure using Terraform
- knowledge about cloud computing and managing cloud environments (Azure preferred)
- knowledge about fundamentals of CI/CD
drive automation to eliminate TOIL
Job tags
Salary
£400 - £450 per day