Posted in

Senior Site Reliability Engineer

Senior Site Reliability Engineer

CompanyStriveworks
LocationAustin, TX, USA
Salary$150000 – $190000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • 6+ years of direct, hands-on experience in: Microservice deployment in Kubernetes
  • Diagnosing and resolving issues within containerized environments
  • Helm Chart and Kustomizations development/deployment
  • Python and Bash programming
  • Automation and IaC (e.g., Terraform, Ansible)
  • Cloud infrastructure (e.g., AWS, Azure, GCP, or OpenStack)
  • Managing and troubleshooting Linux systems (e.g., RHEL, Ubuntu, CentOS)
  • The ability to work cross-functionally to define requirements and build solutions for customer use cases of the platform
  • The ability to respond professionally and competently to incident reports and triage critical system faults
  • Active Top Secret Security clearance, or eligibility and willingness to obtain and maintain a Top Secret Security clearance

Responsibilities

  • Automating IaC to manage virtual machines and deploy containers, services, and other infrastructure; leaning on expertise to deploy custom Kubernetes clusters in AWS, Azure, GCP, on-premises, or hybrid cloud environments
  • Working with platform developers, DevOps, and customer-facing teams to define requirements and build solutions for customer use cases of the platform
  • Software deployments to commercial and, later, unclassified, CUI, Secret, and Top Secret Department of Defense (DoD) networks
  • Incident response and initial triage of critical system faults
  • Monitoring, automating, and improving software reliability, performance, and availability for various projects
  • Providing guidance and leadership to junior SRE team members

Preferred Qualifications

  • Active Top Secret security clearance and intimate familiarity with DOD networking, tools, infrastructure, security requirements, and policies
  • Experience with software deployments to on-premises and cloud-based unclassified, CUI, Secret, or Top Secret networks within the DOD
  • Deep knowledge of DevOps principles and practices for deploying and managing service mesh in cloud environments
  • Experience with DevSecOps/DevOps and CI/CD for the administration and deployment of GPU-enabled servers
  • Experience designing, managing, and optimizing workloads across multiple cloud providers
  • Experience deploying, maintaining, or contributing to Cloud Native Computing Foundation (CNCF) projects
  • Proficiency with US federal information system security policies, including Security Technical Implementation Guides (STIGs), NIST 800-171, NIST 800-53, CMMC, and ICD 503
  • Experience with network-attached storage (NAS) and storage area network (SAN) technologies
  • Experience with Kubernetes and cloud-native applications and services in denied, disrupted, intermittent, and limited impact (DDIL) environments
  • Experience with both blue-green and Canary deployment strategies
  • DOD 8570 IAT II certification (Security+ CE); proficient with security automation and familiarity with API security, container security, and cloud security