Lead – Devops Support Engineering
Company | Magna |
---|---|
Location | Lowell, MA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior, Expert or higher |
Requirements
- 5+ years of experience in DevOps, SRE, or L2 technical support roles.
- Experience creating and tracking tasks for L2 DevOps engineers to drive operational efficiency.
- Strong expertise in automating support processes and troubleshooting complex systems.
- Proficiency in scripting (Bash, Python, or similar) for automation and monitoring.
- Hands-on experience with monitoring & logging tools (Prometheus, Grafana, ELK, Datadog, etc.).
- Solid understanding of CI/CD pipelines, infrastructure components, and cloud services (AWS, GCP, or Azure).
- Experience with containerized environments (Docker, Kubernetes) and troubleshooting containerized applications.
- Strong analytical skills for root cause analysis, incident resolution, and risk assessment.
Responsibilities
- Automate L2 support processes, incident resolution, and infrastructure management.
- Develop and maintain scripts and automation tools to enhance efficiency and reduce manual work.
- Ensure seamless integration between infrastructure, CI/CD pipelines, and monitoring solutions.
- Optimize deployment processes and automate recurring operational tasks.
- Lead DevOps L2 incident response, diagnosing and resolving infrastructure and application issues.
- Perform root cause analysis and implement proactive fixes to prevent recurring incidents.
- Work closely with L1 and L3 teams to streamline support escalations and improve response times.
- Troubleshoot Kubernetes, cloud infrastructure, networking, and deployment failures.
- Design, configure, and optimize monitoring and logging dashboards (Prometheus, Grafana, ELK, etc.).
- Improve alerting mechanisms to enhance observability and reduce noise.
- Ensure system performance metrics are effectively tracked and visualized for proactive incident management.
- Define and optimize support workflows for efficient issue resolution.
- Establish escalation routes to ensure timely handling of critical incidents.
- Evaluate risks associated with deployments and infrastructure changes, implementing mitigation strategies.
- Assist in QA validation of infrastructure changes and automation scripts.
Preferred Qualifications
- Experience with Infrastructure as Code (Terraform, Ansible, CloudFormation).
- Familiarity with AWS ALB Controller, external-dns, and DNS management.
- Exposure to service mesh (Istio, Linkerd) and Kubernetes operators.
- Certifications such as CKA, AWS DevOps Engineer, or similar.