Staff Site Reliability Engineer

Company	Velocity Global
Location	Palo Alto, CA, USA
Salary	$176000 – $229000
Type	Full-Time
Degrees
Experience Level	Senior

Requirements

Outstanding analytical skills with the ability to solve complex systems challenges and performance bottlenecks
Proficient knowledge of public cloud infrastructure, networking, architecture, and Linux as well as orchestration, monitoring, automation, and configuration management solutions
Practical knowledge of distributed service design and performance, including messaging protocols, caching, data residency, and observability
Passion for designing and evolving complex systems while also being able to support day-to-day infrastructure operations
A dedication to learning new techniques and technologies, then sharing ideas with your fellow engineers with mastery of breaking down, discussing, and communicating technical concepts

Responsibilities

Automating observability and alerting across an ever-changing landscape of microservices
Automated Service Reliability Scorecards and Production Readiness Standards
Chaos Engineering and Game Day Simulations to discover and test fixes for weak spots that would otherwise not be identified until a real-life production incident occurred
Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we’ve never considered before
Expand and improve our observability and monitoring footprint
Collaborate with the Engineering and DevOps to create architectural plans, define project requirements, and establish technical standards
Improve common operational challenges by building tools and automating scripts
Serve on the Incident Response Team to help debug and drive resolution of production reliability issues, contribute to the postmortem, and work to prevent recurrence
Participate in design and production reviews for new features, products, or infrastructure
Audit and tune the configuration of systems owned by other engineering teams
Plan for the growth of Velocity Global’s infrastructure and infrastructure reliability/resiliency
Designing and implementing High Availability architecture underlying Velocity Global’s platform
Creating Disaster Recovery solutions, including backups, redundant systems, and emergency response processes
Collaborating with Architects and Engineering leaders in the hiring, training and mentoring of all talents.

Preferred Qualifications

5-8 years of experience (Depending on open role) Software engineering experience, preferably within the Infrastructure Engineering area.
5-8 years of experience in highly scalable cloud architectures including service-oriented architectures (AWS and/or GCP experience preferred)
Ability to collaborate well and come up with maintainable, reliable solutions. Experience building scalable, high-performing systems.
Strong analytical and problem-solving skills.
Ability to provide both architectural guidance and detailed technical directions.
Excellent communication, collaboration and leadership skills