Skip to content

Staff Site Reliability Engineer
Company | Velocity Global |
---|
Location | Palo Alto, CA, USA |
---|
Salary | $176000 – $229000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- Outstanding analytical skills with the ability to solve complex systems challenges and performance bottlenecks
- Proficient knowledge of public cloud infrastructure, networking, architecture, and Linux as well as orchestration, monitoring, automation, and configuration management solutions
- Practical knowledge of distributed service design and performance, including messaging protocols, caching, data residency, and observability
- Passion for designing and evolving complex systems while also being able to support day-to-day infrastructure operations
- A dedication to learning new techniques and technologies, then sharing ideas with your fellow engineers with mastery of breaking down, discussing, and communicating technical concepts
Responsibilities
- Automating observability and alerting across an ever-changing landscape of microservices
- Automated Service Reliability Scorecards and Production Readiness Standards
- Chaos Engineering and Game Day Simulations to discover and test fixes for weak spots that would otherwise not be identified until a real-life production incident occurred
- Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we’ve never considered before
- Expand and improve our observability and monitoring footprint
- Collaborate with the Engineering and DevOps to create architectural plans, define project requirements, and establish technical standards
- Improve common operational challenges by building tools and automating scripts
- Serve on the Incident Response Team to help debug and drive resolution of production reliability issues, contribute to the postmortem, and work to prevent recurrence
- Participate in design and production reviews for new features, products, or infrastructure
- Audit and tune the configuration of systems owned by other engineering teams
- Plan for the growth of Velocity Global’s infrastructure and infrastructure reliability/resiliency
- Designing and implementing High Availability architecture underlying Velocity Global’s platform
- Creating Disaster Recovery solutions, including backups, redundant systems, and emergency response processes
- Collaborating with Architects and Engineering leaders in the hiring, training and mentoring of all talents.
Preferred Qualifications
- 5-8 years of experience (Depending on open role) Software engineering experience, preferably within the Infrastructure Engineering area.
- 5-8 years of experience in highly scalable cloud architectures including service-oriented architectures (AWS and/or GCP experience preferred)
- Ability to collaborate well and come up with maintainable, reliable solutions. Experience building scalable, high-performing systems.
- Strong analytical and problem-solving skills.
- Ability to provide both architectural guidance and detailed technical directions.
- Excellent communication, collaboration and leadership skills