Location
Hyderabad | India
Job description
Responsibilities
- Build & Support CI/CD tools to port & manage applications on AWS/GCP & Kubernetes
- Evaluate and port Language Models onto optimized infrastructure to reduce cost and increase performance
- Identify the LLM evaluation frameworks and integrate with the platform
- Build automation to enable self-healing systems
- Build tools to monitor & alert the high performance and low latency applications on AWS/GCP
- Ability to troubleshoot application specific, core network, system & performance issues.
- Build a multi-tenancy system by enforcing data protection between different use cases.
The candidate is expected to be self-motivated, proactive, and a solution-oriented
Requirements
Qualifications
- 4+ years of experience in SRE/Devops
- Strong work experience in AI / ML is mandatory.
- Extensive experience in managing the applications on AWS/GCP & Kubernetes
- Strong Experience in Infrastructure templating tools like CloudFormation, Terraform
- Experience in building CI/CD pipelines for large scale application on AWS/GCP & Kubernetes
- Experience in GitOps based deployment tools like Spinnaker/Flux/ArgoCD
- Strong proficiency with Helm and Kustomize for managing Kubernetes applications and configurations.
- Experience in observability & traceability for Large Language Models (LLM)
- Deep understanding of Object Oriented Programming skills like Java.
- Experience in Performance tuning JVMs & Operating Systems like Linux
- Strong programming skills in Unix & Python
- Excellent analytical & problem solving skills
Job tags
Salary