Posted in

Senior Manager Observability and Reliability Platform Engineering

Senior Manager Observability and Reliability Platform Engineering

CompanyGeico
LocationSan Francisco, CA, USA, Bethesda, MD, USA
Salary$150000 – $300000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Strong expertise with Python, Golang or Java and RESTful Services, with Focus on building high throughput/High volume distributed systems
  • Strong Expert in Unix, Container orchestration (e.g., Kubernetes), container runtimes and optimization
  • Experience with Open-source Observability tools such as Prometheus, and LGTM stack will be a big plus
  • Strong understanding on Columnar data stores
  • Strong understanding of Site Reliability Engineering and DevOps principles
  • Strong technical acumen in Cloud Architecture, Performance Benchmarking, and Capacity planning
  • Solid foundation in algorithms, data structures, and core computer science concepts
  • Experience managing and growing engineers and teams
  • In-depth knowledge of CS data structures and algorithms
  • Basic UI/UX and prototype design knowledge and experience
  • Proven ability to concentrate and demonstrate a capacity for learning technical concepts and adapting to new technologies quickly
  • Strong Cloud (AWS, GCP, Azure etc.) platform knowledge
  • Proficiency in Project Management and work item management tools such as Azure DevOps and Portfolio

Responsibilities

  • Have strong technical expertise and leadership, you are able to lead from the trenches and have proven knowledge in the area of Observability
  • Be able to drive the build out of multi cloud infrastructure, lead by example and be a role model to the team of developers and infrastructure engineers
  • Work with your Director to address project dependencies, negotiate and estimate incremental delivery dates for milestones with the stakeholder community, and deliver projects on time
  • Understand how requirements and design choices may impact systems across multiple areas
  • Report on your team’s progress for project and other key metrics, in addition to presenting detailed and implementable ideas for areas to further improve or influence product or project delivery
  • Initiate and support performance evaluation of team members
  • Cultivate a culture that motivates all levels of performers to higher levels of achievement
  • Build and maintain relationships with your team members to support an environment of trust
  • Identify where technical or analytical skill gaps put future team deliverables at risk and craft a plan to remediate, consistently challenge team members to share knowledge and learn new technologies
  • Significantly contribute to the team planning process to include surfacing associate level proposals
  • Collaborate with the product teams to understand their pain points around performance, resiliency and formulate strategies to address recurring issues in a sustainable way
  • Develop and motivate teams to solve complex problems and be a strong advocate for open-source technologies and solutions
  • Be responsible for building and mentoring a new team of Site reliability engineers
  • Drive the team towards building solutions towards the long-term goals while ensuring that high priority tech debts are solved in an efficient way
  • Be a strong thought leader in Observability, Site Reliability engineering Principles
  • Consistently share best practices and improve processes within and across teams

Preferred Qualifications

    No preferred qualifications provided.