Site Reliability Engineer
Company | Baseten |
---|---|
Location | San Francisco, CA, USA, New York, NY, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s, PhD |
Experience Level | Mid Level |
Requirements
- Bachelor’s, Master’s, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
- 3+ years of work professional work experience in a fast-paced, high-growth environment.
- Extensive experience with Kubernetes.
- Experience in building and maintaining scalable infrastructure.
- Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins).
- Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.
- Ability to own projects end-to-end, from project specification to execution.
- No prior machine learning experience required, but should be open to learning about it.
Responsibilities
- Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
- Establish standards and best practices for reliability and performance across the infrastructure.
- Automate processes when relevant, particularly for managing CI/CD pipelines.
- Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution.
- Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions.
- Mentor junior team members and contribute to knowledge sharing within the organization.
- Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
- Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.
Preferred Qualifications
- Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.