Site Reliability Engineer
Location
Halifax, NS | Canada
Job description
Site Reliability Engineer
Position Description
As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, performance, and availability of our systems. Your expertise in managing infrastructure, automating processes, and implementing best practices will contribute to seamless operations. You’ll collaborate with cross-functional teams, aligning development and operations around shared goals.
Your future duties and responsibilities
Infrastructure Management:
• Run and maintain infrastructure using tools like Terraform, Kubernetes, and Helm.
• Deploy and manage services on Google Cloud Platform (GCP).
• Implement best SRE practices, document improvements, and drive infrastructure enhancements.
Monitoring and Alerting:
• Improve monitoring and alerting systems to detect incidents promptly and reduce false positives.
• Monitor service health using SRE principles, including defining Service Level Indicators (SLIs), Service Level Objectives (SLOs), and tracking error budgets.
Tech Stack:
• Programming Languages: Proficiency in scripting languages such as Python.
• Big Data Technologies: Familiarity with Apache Hadoop, Spark, and Kafka.
• Database Management: Experience with both relational databases (e.g., PostgreSQL, MySQL, Oracle) and NoSQL databases (e.g., MongoDB, Cassandra).
• Cloud Platforms:
• Google Cloud Platform (GCP):
- Deploy scalable, fault-tolerant systems using GCP services (e.g., Google App Engine, Kubernetes Engine, Compute Engine).
- Set up and manage virtual machines, containers, and serverless functions.
- Utilize managed databases (Cloud SQL, Firestore) for data storage.
- Implement authentication and authorization using GCP Identity and Access Management (IAM).
- Monitor application performance with GCP Stackdriver.
Collaboration:
• Work closely with development teams to align on shared goals.
• Troubleshoot incidents using built-in integrations with tools like Cloud Build.
• Provide real-time observability across logs, metrics, and events.
Required qualifications to be successful in this role
Education: Bachelor’s or Master’s degree in Computer Science, Information Technology, or related fields.
Experience:
• Minimum 3-5 years of experience in SRE or related roles.
• Strong IT knowledge and skills.
• Familiarity with data analysis, CI/CD pipelines, and database management.
• Exposure to cloud services (GCP) is highly desirable.
Desired Skills:
• Automation: Ability to automate end-to-end processes.
• Security: Understanding of data security best practices.
• Troubleshooting: Proficiency in incident resolution and troubleshooting.
• Communication: Effective communication and collaboration skills.
#LI-NB5
Insights you can act on While technology is at the heart of our clients’ digital transformation, we understand that people are at the heart of business success.
When you join CGI, you become a trusted advisor, collaborating with colleagues and clients to bring forward actionable insights that deliver meaningful and sustainable outcomes. We call our employees "members" because they are CGI shareholders and owners and owners who enjoy working and growing together to build a company we are proud of. This has been our Dream since 1976, and it has brought us to where we are today — one of the world’s largest independent providers of IT and business consulting services.
At CGI, we recognize the richness that diversity brings. We strive to create a work culture where all belong and collaborate with clients in building more inclusive communities. As an equal-opportunity employer, we want to empower all our members to succeed and grow. If you require an accommodation at any point during the recruitment process, please let us know. We will be happy to assist.
Ready to become part of our success story? Join CGI — where your ideas and actions make a difference.
Job tags
Salary