logo

JobNob

Your Career. Our Passion.

Director - SaaS Operations Monitoring and Alerting


Saviynt


Location

Bangalore | India


Job description

Saviynt's Enterprise Identity Cloud helps modern enterprises scale cloud initiatives and solve the toughest security and compliance challenges in record time. The company brings together identity governance (IGA), granular application access, cloud security, and privileged access (PAM) to secure the entire business ecosystem and provide a frictionless user experience. The world's largest brands trust Saviynt to accelerate digital transformation, empower distributed workforces, and meet continuous compliance.

Our Monitoring and Alerting team within the SaaS Operations team combines Operations

Excellence with the Development Experience to deliver services at high scale, high availability

with resilience by using automation and Infrastructure Code. We build reliability into our

ecosystem by applying best practices in Resiliency Engineering, Automation, Observability &

Chaos Testing. The primary focus is on implementing monitoring solutions, establishing alerting

processes, and ensuring timely responses to incidents.

The Director SaaS Operations Monitoring and Alerting for our Cloud Services is a Technical

Leader who plays a crucial role in overseeing the monitoring, alerting, and incident response

functions specifically tailored to cloud-based environments. Would be required to enhance and

build Monitoring and Alerting program by implementing the best-in-class monitoring and

alerting practices, tools. Would foster a collaborative and innovative culture within the team.

The team comes from diverse technical backgrounds, and the responsibilities provide the

opportunity for a variety of challenges. Ideal candidates will have a background in either

software engineering or systems engineering with a desire to learn the other or previous

experience with building and managing Monitoring and Alerting systems. We are looking for a

Systems Thinking, Principal Engineer who has helped teams scale through production insights,

operational automation, building observability program, developer guidance, real-time metrics,

automation, automation, automation!

must possess a deep understanding of cloud technologies, DevOps practices, and be adept at

implementing monitoring solutions that cater to the unique challenges of the cloud.

Objectives For The Role

 Lead and Build world class Monitoring and Alerting Program for our Cloud Services to

guarantee high availability and performance, with a dedicated focus on SLA and

availability metrics for a large-scale Cloud environment.

 Develop and implement a comprehensive monitoring and alerting strategy aligned with

the organization's goals and objectives.

 Define key performance indicators (KPIs) and metrics to measure the health and

performance of systems, networks, and applications.

 Identify, evaluate, and implement monitoring tools and technologies that suit the

organization's needs.

 Evaluate, select, and implement cloud-specific monitoring tools and technologies.

 Collaborate with engineering and operations teams to identify critical components and

systems requiring enhanced availability measures.

 Continuously evaluate and recommend improvements to platform infrastructure and

processes, enhancing efficiency and reliability.

 Run the production environment by monitoring availability and taking a holistic view of

system health.

 Develop and manage the budget for monitoring and alerting initiatives.

 Optimize costs while ensuring the effectiveness of monitoring solutions.

 Build software and systems to monitor platform infrastructure and applications at scale.

 Measure and optimize system performance, with an eye toward pushing our capabilities

forward, getting ahead of customer needs, and innovating for continual improvement.

 Automate. Automate. Automate!!

The Expertise You Have

 Bachelor's degree or higher in a technology related field (e.g. Engineering, Computer

Science, etc.) required, master's degree a plus

 12+ years professional experience in Monitoring and Alerting roles on major cloud

platforms (AWS, Azure), with Program leadership roles.

 6+ experience in Cloud (AWS, Azure) and observability skills; Experience with building

and operating highly resilient platforms in AWS cloud environments.

 4+ years of experience in software development with Python, NodeJS, or Java with a

focus on SDLC and automation

 Possess a deep understanding of cloud technologies (AWS, K8, Azure), DevOps practices,

and be adept at implementing monitoring solutions that cater to the unique challenges

of the cloud.

 Experience running programs for observability, monitoring and alerting on large scale

distributed systems.

 Ability to quickly Hire and retain the best industry talent.

 Expertise on Logging and monitoring tools (Prefer: Prometheus, Grafana, Datadog, AWS

CloudWatch; Related: Azure Monitor, Log Analytics, Fluentd)

The Skills You Bring

 Proven experience in building Monitoring and Alerting program.

 Implementing advanced observability practices and techniques at scale.

 Lead, Strategize, Plan, and Execute programs.

 Experience with Instrumentation with systems skills on building and operating,

monitoring, logging, alerting services of distributed systems at scale.

 Demonstrated ability running Observability program that use utilize modern monitoring

tools (Data Dog, Prometheus, etc)

 Solid understanding of Cloud Computing, Networking and DevOps concepts.

 Experience with Microservices, and databases.

 Proven experience in maintaining scalability and resiliency of complex environment.

 Knowledge of Network Security (e.g. AWZ Policy, Azure Policy, VPN, Active

Directory/RBAC, ACLs, NSG rules, private endpoints)

 Ability to lead team to triage, execute root cause analysis, and be decisive under

pressure.

 Proficient communication skills with an ability to reach both technical and non-technical

audience.

 Ability to learn method sand practices and bringing them to our engineers.

 Ability to work with a variety of individuals and groups, both in person and virtually, in a

constructive and collaborative manner and build and maintain effective relationships.

The Value You Deliver

 Enhance and build a world class Monitoring and Alerting program for our Cloud

offerings.

 Define and execute a comprehensive reliability and observability strategy to work at

scale, ensuring that Saviynt systems are always available when our customers need

them.

 Hire and retain the best industry talent.

 You will execute plans for technical standardization and process refinement within the

engineering organization, especially for Site Reliability Engineers.

Saviynt is an amazing place to work. We are a high-growth, cloud software company with phenomenal people, that is building the most innovative identity platform in the world. Your time at Saviynt will be worthwhile. You will experience tremendous growth and learning while being part of something you are helping to define and build from the ground up. Through challenging yet rewarding work, you will be able to directly impact our clients, all within a welcoming and positive work environment. If you're resilient and enjoy working in a dynamic high-growth environment you belong with us!


Job tags



Salary

All rights reserved