Posted in

Cloud Site Reliability Engineer II

Cloud Site Reliability Engineer II

CompanyZafin
LocationOttawa, ON, Canada
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelExpert or higher

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred)
  • 12+ years of experience in cloud support, operations, or a related role
  • Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms
  • Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift
  • Proven leadership in managing automated deployment pipelines, including Azure DevOps
  • Mastery in enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools
  • Advanced scripting skills with PowerShell, Python, or similar languages
  • Extensive experience in incident management and defining SLAs for global production environments
  • In-depth knowledge of database management, particularly Postgres

Responsibilities

  • Lead and manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment
  • Design and implement strategic operational enhancements to improve resiliency and system reliability
  • Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence
  • Represent the organization in external client escalation calls, providing expert guidance and solutions
  • Architect and optimize cloud infrastructure for high performance, scalability, and cost-effectiveness
  • Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift
  • Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution
  • Develop and execute automation strategies to streamline operational workflows and incident responses
  • Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies
  • Mentor and coach junior engineers, fostering a culture of continuous learning and innovation
  • Drive strategic initiatives, collaborating with cross-functional teams to achieve organizational objectives

Preferred Qualifications

  • Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert)
  • Experience with ITSM tools and processes (e.g., ServiceNow)
  • Comprehensive understanding of security and compliance in cloud environments