logo

JobNob

Your Career. Our Passion.

Site Reliability Engineer, AI Infrastructure & Observability


Tesla Motors


Location

Palo Alto, CA | United States


Job description

The Role
Tesla's NOC supports global Infrastructure, Manufacturing, and Applications to identify and resolve problems with high-speed cross-team collaboration. As part of this function, we work closely with the high-performance computing and AI infrastructure teams within IT Infrastructure. With the rapidly-growing need for more data and optimized compute resources, our observability and service delivery need to scale in parallel. We are looking for a Site Reliability Engineer to join our team with a focus on AI Infrastructure and Observability. This hybrid role will work closely with Incident Management as well as observability, traffic, and other software & infrastructure leads to monitor and optimize Tesla's AI Infrastructure,


As a Site Reliability Engineer, you will be responsible for problem detection and escalation for our AI Infrastructure, ensuring engineering teams across Autopilot/AI and Dojo have the necessary tools and resources to be productive. This is a hands on technical role and a successful candidate should combine strong technical, analytical, and service delivery backgrounds to excel in this role.

Responsibilities

  • Collaborate with a cross-functional team of SRE engineers, architects, and other stakeholders to understand complex application architectures, enabling the implementation of an effective top-down monitoring strategy for holistic service visibility
  • Build, maintain, and monitor dashboards for critical infrastructure
  • Create and tune alerts for network and hardware so that potential problems are identified, routed, and remediated early
  • Facilitate knowledge sharing by creating and maintaining detailed and comprehensive documentation, diagrams, and runbooks
  • Respond to and resolve support requests in a timely fashion while managing project timelines and other responsibilities
  • Serve as a frontline support resource to AI Software teams to triage problems and engage relevant engineering support
  • Participate in 24x7 on-call rotation

Requirements

  • Sound judgement, outstanding communication, & ability to work with internal customers in a fast-paced, high visibility role
  • Proficiency in high-level programming language and/or scripting with (Python, Golang, Bash)
  • Experience with troubleshooting distributed systems
  • Strong knowledge of multiple observability tools : Splunk, Prometheus/Alert Manager, Synthetic Monitoring, Grafana
  • Prior Experience in Catchpoint and/or Kentik a plus
  • Strong understanding of Linux fundamentals (Ubuntu/RHEL OS)
  • Excellent understanding of Network and Traffic fundamentals
  • Experience in collaborating with network and data center teams for large scale infrastructure support
  • 3+ years of additional equivalent experience or evidence of exceptional ability related to the position

Compensation and Benefits
Benefits

Along with competitive pay, as a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:

  • Aetna PPO and HSA plans > 2 medical plan options with $0 payroll deduction
  • Family-building, fertility, adoption and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans, both have options with a $0 paycheck contribution
  • Company Paid (Health Savings Account) HSA Contribution when enrolled in the High Deductible Aetna medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • LGBTQ+ care concierge services
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company paid Basic Life, AD&D, short-term and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions), and Paid Holidays
  • Back-up childcare and parenting support resources
  • Voluntary benefits to include: critical illness, hospital indemnity, accident insurance, theft & legal services, and pet insurance
  • Weight Loss and Tobacco Cessation Programs
  • Tesla Babies program
  • Commuter benefits
  • Employee discounts and perks program
  • Expected Compensation

    $104,000 - $348,000/annual salary + cash and stock awards + benefits

    Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.]]>


Job tags

Holiday workFull timeTemporary workFlexible hoursNight shift


Salary

All rights reserved