Posted in

Staff Site Reliability Engineer

Staff Site Reliability Engineer

CompanyVelocity Global
LocationPalo Alto, CA, USA
Salary$176000 – $229000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Outstanding analytical skills with the ability to solve complex systems challenges and performance bottlenecks
  • Proficient knowledge of public cloud infrastructure, networking, architecture, and Linux as well as orchestration, monitoring, automation, and configuration management solutions
  • Practical knowledge of distributed service design and performance, including messaging protocols, caching, data residency, and observability
  • Passion for designing and evolving complex systems while also being able to support day-to-day infrastructure operations
  • A dedication to learning new techniques and technologies, then sharing ideas with your fellow engineers with mastery of breaking down, discussing, and communicating technical concepts

Responsibilities

  • Automating observability and alerting across an ever-changing landscape of microservices
  • Automated Service Reliability Scorecards and Production Readiness Standards
  • Chaos Engineering and Game Day Simulations to discover and test fixes for weak spots that would otherwise not be identified until a real-life production incident occurred
  • Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we’ve never considered before
  • Expand and improve our observability and monitoring footprint
  • Collaborate with the Engineering and DevOps to create architectural plans, define project requirements, and establish technical standards
  • Improve common operational challenges by building tools and automating scripts
  • Serve on the Incident Response Team to help debug and drive resolution of production reliability issues, contribute to the postmortem, and work to prevent recurrence
  • Participate in design and production reviews for new features, products, or infrastructure
  • Audit and tune the configuration of systems owned by other engineering teams
  • Plan for the growth of Velocity Global’s infrastructure and infrastructure reliability/resiliency
  • Designing and implementing High Availability architecture underlying Velocity Global’s platform
  • Creating Disaster Recovery solutions, including backups, redundant systems, and emergency response processes
  • Collaborating with Architects and Engineering leaders in the hiring, training and mentoring of all talents.

Preferred Qualifications

  • 5-8 years of experience (Depending on open role) Software engineering experience, preferably within the Infrastructure Engineering area.
  • 5-8 years of experience in highly scalable cloud architectures including service-oriented architectures (AWS and/or GCP experience preferred)
  • Ability to collaborate well and come up with maintainable, reliable solutions. Experience building scalable, high-performing systems.
  • Strong analytical and problem-solving skills.
  • Ability to provide both architectural guidance and detailed technical directions.
  • Excellent communication, collaboration and leadership skills