Skip to content

Senior Site Reliability Engineer
Company | Striveworks |
---|
Location | Austin, TX, USA |
---|
Salary | $150000 – $190000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- 6+ years of direct, hands-on experience in: Microservice deployment in Kubernetes
- Diagnosing and resolving issues within containerized environments
- Helm Chart and Kustomizations development/deployment
- Python and Bash programming
- Automation and IaC (e.g., Terraform, Ansible)
- Cloud infrastructure (e.g., AWS, Azure, GCP, or OpenStack)
- Managing and troubleshooting Linux systems (e.g., RHEL, Ubuntu, CentOS)
- The ability to work cross-functionally to define requirements and build solutions for customer use cases of the platform
- The ability to respond professionally and competently to incident reports and triage critical system faults
- Active Top Secret Security clearance, or eligibility and willingness to obtain and maintain a Top Secret Security clearance
Responsibilities
- Automating IaC to manage virtual machines and deploy containers, services, and other infrastructure; leaning on expertise to deploy custom Kubernetes clusters in AWS, Azure, GCP, on-premises, or hybrid cloud environments
- Working with platform developers, DevOps, and customer-facing teams to define requirements and build solutions for customer use cases of the platform
- Software deployments to commercial and, later, unclassified, CUI, Secret, and Top Secret Department of Defense (DoD) networks
- Incident response and initial triage of critical system faults
- Monitoring, automating, and improving software reliability, performance, and availability for various projects
- Providing guidance and leadership to junior SRE team members
Preferred Qualifications
- Active Top Secret security clearance and intimate familiarity with DOD networking, tools, infrastructure, security requirements, and policies
- Experience with software deployments to on-premises and cloud-based unclassified, CUI, Secret, or Top Secret networks within the DOD
- Deep knowledge of DevOps principles and practices for deploying and managing service mesh in cloud environments
- Experience with DevSecOps/DevOps and CI/CD for the administration and deployment of GPU-enabled servers
- Experience designing, managing, and optimizing workloads across multiple cloud providers
- Experience deploying, maintaining, or contributing to Cloud Native Computing Foundation (CNCF) projects
- Proficiency with US federal information system security policies, including Security Technical Implementation Guides (STIGs), NIST 800-171, NIST 800-53, CMMC, and ICD 503
- Experience with network-attached storage (NAS) and storage area network (SAN) technologies
- Experience with Kubernetes and cloud-native applications and services in denied, disrupted, intermittent, and limited impact (DDIL) environments
- Experience with both blue-green and Canary deployment strategies
- DOD 8570 IAT II certification (Security+ CE); proficient with security automation and familiarity with API security, container security, and cloud security