Staff Devops Engineer

10+ years of experience in DevOps, SRE, or infrastructure engineering, including recent senior/staff-level roles with high-impact ownership
Proven experience with Terraform at scale (modules, orchestration, testing), ideally across multiple cloud environments (AWS preferred)
Familiarity with container orchestration technologies such as Amazon ECS, Kubernetes (K8s), and Serverless frameworks, including experience deploying, scaling, and managing containerized applications in production environments
Strong Python programming skills with the ability to build tools, automate systems, and contribute to application codebases, coupled with a solid understanding of application internals and deployment considerations
Deep experience with GitHub Actions for CI/CD— including custom workflows, reusable actions, and multi-environment pipelines
Solid understanding of network architecture, security principles, and cloud native infrastructure patterns
Hands-on experience with monitoring and observability, especially with Datadog, and the ability to interpret metrics/logs to guide system improvements
Excellent communication and collaboration skills—you’re comfortable navigating technical conversations across engineering, security, and leadership

Take technical ownership of our Terraform-based IaC platform—assess the current state, define next steps, and drive delivery of improvements across environments
Work closely with engineering teams to design and implement cloud-native architectures, optimize deployment pipelines, and contribute directly to Python-based codebases where needed
Strengthen and scale our CI/CD processes using GitHub Actions, including artifact packaging, environment promotion, and automated testing/release workflows
Shape and execute our global infrastructure strategy, including multi-region deployment, scalability, and resiliency planning
Implement and improve disaster recovery planning, incident response, and high availability for critical systems
Design and enforce cloud networking and security best practices, including IAM, VPC architecture, and secrets management
Drive improvements in observability and performance monitoring using Datadog APM, metrics, logs, and alerting to proactively identify and resolve issues
Collaborate across teams to implement and evangelize SRE principles, including SLIs, SLOs, and error budgets
Serve as a technical mentor and thought partner, helping level up others while delivering meaningful improvements yourself

Background working in platform engineering, developer experience, or infrastructure leadership roles
Experience navigating the challenges of a startup transitioning into a scale-up, including evolving systems, processes, and team structures
Experience mentoring or guiding teams to turn projects around
Knowledge of compliance and risk management in regulated environments (HIPAA, SOC2, ISO27001, etc.)