Cloud Site Reliability Engineer II
Company | Zafin |
---|---|
Location | Ottawa, ON, Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Expert or higher |
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred)
- 12+ years of experience in cloud support, operations, or a related role
- Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms
- Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift
- Proven leadership in managing automated deployment pipelines, including Azure DevOps
- Mastery in enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools
- Advanced scripting skills with PowerShell, Python, or similar languages
- Extensive experience in incident management and defining SLAs for global production environments
- In-depth knowledge of database management, particularly Postgres
Responsibilities
- Lead and manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment
- Design and implement strategic operational enhancements to improve resiliency and system reliability
- Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence
- Represent the organization in external client escalation calls, providing expert guidance and solutions
- Architect and optimize cloud infrastructure for high performance, scalability, and cost-effectiveness
- Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift
- Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution
- Develop and execute automation strategies to streamline operational workflows and incident responses
- Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies
- Mentor and coach junior engineers, fostering a culture of continuous learning and innovation
- Drive strategic initiatives, collaborating with cross-functional teams to achieve organizational objectives
Preferred Qualifications
- Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert)
- Experience with ITSM tools and processes (e.g., ServiceNow)
- Comprehensive understanding of security and compliance in cloud environments