Site Reliability Engineer

N Consulting Ltd

Location

Bloomsbury, Greater London | United Kingdom

Job description

determine the reliability of our digital products, technology services, and the infrastructure that underpins them
minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
respond to production incidents to gain first-hand experience of operational hotspots and to identify the root causes of problems
collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to automated testing, deployment, and operations
ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements

Your expertise

ideally 5+ years of experience in an Application Support role within financial services industry
excellent verbal and written communication skills along with strong collaboration skills
experience in some scripting languages
experience in Application Performance Monitoring (APM) Tools
knowledge of Linux OS
knowledge in IP networking
knowledge of troubleshooting Java applications
knowledge of application and web servers (NGINX, Apache)
knowledge of visualization (Docker, K8S)
knowledge of provisioning cloud infrastructure using Terraform
knowledge about cloud computing and managing cloud environments (Azure preferred)
knowledge about fundamentals of CI/CD

drive automation to eliminate TOIL

Job tags

Salary

£400 - £450 per day