Posted in

Site Reliability Engineering Specialist

Site Reliability Engineering Specialist

CompanyTelesat
LocationOttawa, ON, Canada
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelExpert or higher

Requirements

  • Bachelor’s Degree in Computer Science or a related field
  • Minimum nine years of experience in IT operations with a focus on reliability, uptime, availability and performance
  • At least five years of hands-on provable experience with Microsoft Azure including deployment, management, and monitoring
  • Expertise in automation and configuration management tools with demonstrable experience using tools such as Terraform and Ansible to automate infrastructure and application deployment
  • Strong understanding of monitoring and observability tools with proven experience in monitoring tools such as Prometheus, Grafana, Nagios, or Splunk, and the ability to implement and maintain observability solutions
  • CNCF Certified Kubernetes Administrator (CKA) would be considered an Asset for this role.

Responsibilities

  • Work closely with Telesat’s cloud engineers to deploy and maintain our Kubernetes-based infrastructure
  • Help maintain high availability, uptime and resiliency of our infrastructure
  • Perform day-to-day operational tasks such as upgrades and patching of the Kubernetes platform
  • Automate operational tasks
  • Monitor the health of the platform and applications using Telesat’s observability platform
  • Improve observability, define and measure SLOs
  • Collaborate with development teams to resolve application issues
  • Go on-call and respond to automated alerts and execute playbooks
  • Identify gaps in processes, as well as build or improve tools to support incident management
  • Facilitate incident response and conduct root cause analysis

Preferred Qualifications

  • CNCF Certified Kubernetes Administrator (CKA) would be considered an Asset for this role.