Posted in

Sr. Kubernetes Platform Site Reliability Engineer – Starlink

Sr. Kubernetes Platform Site Reliability Engineer – Starlink

CompanySpaceX
LocationRedmond, WA, USA
Salary$160000 – $220000
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior

Requirements

  • Bachelor’s degree in computer science, information systems/IT, or an engineering discipline and 5+ years of professional experience in Site Reliability Engineering or DevOps; OR 7+ years of professional experience in Site Reliability Engineering or DevOps in lieu of a degree
  • 2+ years of professional experience with Linux operating systems
  • Experience with Terraform, Ansible, or other infrastructure tools
  • Experience with containerization technologies (i.e. OCI containers, Kubernetes)
  • Experience scripting in Bash, Python, or other similar languages
  • Development experience in Python, C++, or Go

Responsibilities

  • Develop automation to deploy and manage on-premise Kubernetes clusters
  • Deploy and manage core infrastructure such as databases, monitoring and distributed storage
  • Closely collaborate with software engineers to create highly scalable, operable, and maintainable products
  • Engage in and improve the whole lifecycle of services — from inception and design, through deployment, operation and refinement
  • Monitoring and alerting supporting systems to have high availability
  • Hands-on integration and troubleshooting across the entire Starlink stack
  • Identify areas for improvement and create innovative solutions that enable high system availability

Preferred Qualifications

  • 1+ years of experience with Python and Python-based development frameworks
  • Experience managing Kubernetes clusters, not just using them
  • Knowledge of Linux boot process and systems configuration
  • Deep understanding of testing, continuous integration, build, deployment & continuous monitoring
  • Understanding of relevant build technologies, such as Bazel and Makefiles
  • Focus on performance bottlenecks and performance improvement techniques
  • Understanding of distributed databases and data modeling
  • Experience with automatically managing dozens, hundreds, or thousands of servers (eg: Terraform or Ansible)
  • Strong networking knowledge of TCP/IP
  • Excellent communications skills with the ability to communicate with customers, peers, management etc. in both formal and informal situations