SRE (Site Reliability Engineer)
Location
Hartford, CT | United States
Job description
Role :: SRE (Site Reliability Engineer )
Location :: Hartford, CT Onsite
Duration :: Contract (long term)
Role Description:
1.Migration Planning and Execution:
- Collaborate with architecture and development teams to plan the migration of applications and services to AWS.
- Ensure that all necessary dependencies and resources are identified and provisioned in AWS aligning to operational readiness checklist.
- Execute migration plans, ensuring minimal downtime and disruption to services.
2. Infrastructure as Code (IaC):
- Develop and maintain IaC scripts using tools like AWS Cloud Formation or Terraform to automate the provisioning and management of AWS resources.
- Ensure that infrastructure provisioning follows best practices for scalability, reliability, and security.
3. Observability and Monitoring:
- Implement comprehensive monitoring solutions using Dynatrace, Splunk, AWS CloudWatch to monitor the health and performance of migrated services.
- Set up logging and tracing to capture and analyze system behavior and application performance.
4. Alerting and Incident Management:
- Configure alerting thresholds and notifications to quickly identify and respond to issues.
- Develop incident response plans and participate in on-call rotations to ensure rapid issue resolution.
5. Performance Tuning and Optimization:
- Continuously monitor system and application performance, identifying bottlenecks and areas for improvement.
- Optimize AWS resource usage (e.g., EC2 instances,ECS, RDS databases) for cost-effectiveness and performance.
6. Security and Compliance:
- Implement AWS security best practices, including network configurations, IAM policies, and encryption, to protect data and resources.
- Ensure that the cloud environment complies with relevant industry standards and regulations.
7. Disaster Recovery and Business Continuity:
- Design and implement disaster recovery strategies, including data backup and replication, to ensure data integrity and availability.
- Test and validate disaster recovery plans to ensure business continuity.
8. Cost Management and Optimization:
- Monitor and optimize AWS spending, using tools like AWS Cost Explorer and Cloud ability
- Implement cost-saving measures such as reserved instances, spot instances, and auto-scaling.
9. Continuous Improvement and Automation:
- Identify repetitive tasks and processes that can be automated to improve efficiency and reduce human error.
- Continuously evaluate and adopt new AWS services and features that can enhance system reliability and performance.
Essential Skills:
- Application Support Experience in Java or .Net
- Any flavour with hands on experience in Application Support , Performance Tunning etc
- Working knowledge as developer or DBA on Oracle, SQL or Postgres
Desirable Skills: DevOps Skills
- AWS skills - ( Specifically around - ECS, EC2, RDS, SNS, SQS, S3, Lambda)
- Splunk or monitoring tools skills
- App Migration experience Cost monitoring/Cloud Operations SRE/Automation mindset
Job tags
Salary