Senior Manager Observability and Reliability Platform Engineering
Company | Geico |
---|---|
Location | San Francisco, CA, USA, Bethesda, MD, USA |
Salary | $150000 – $300000 |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Strong expertise with Python, Golang or Java and RESTful Services, with Focus on building high throughput/High volume distributed systems
- Strong Expert in Unix, Container orchestration (e.g., Kubernetes), container runtimes and optimization
- Experience with Open-source Observability tools such as Prometheus, and LGTM stack will be a big plus
- Strong understanding on Columnar data stores
- Strong understanding of Site Reliability Engineering and DevOps principles
- Strong technical acumen in Cloud Architecture, Performance Benchmarking, and Capacity planning
- Solid foundation in algorithms, data structures, and core computer science concepts
- Experience managing and growing engineers and teams
- In-depth knowledge of CS data structures and algorithms
- Basic UI/UX and prototype design knowledge and experience
- Proven ability to concentrate and demonstrate a capacity for learning technical concepts and adapting to new technologies quickly
- Strong Cloud (AWS, GCP, Azure etc.) platform knowledge
- Proficiency in Project Management and work item management tools such as Azure DevOps and Portfolio
Responsibilities
- Have strong technical expertise and leadership, you are able to lead from the trenches and have proven knowledge in the area of Observability
- Be able to drive the build out of multi cloud infrastructure, lead by example and be a role model to the team of developers and infrastructure engineers
- Work with your Director to address project dependencies, negotiate and estimate incremental delivery dates for milestones with the stakeholder community, and deliver projects on time
- Understand how requirements and design choices may impact systems across multiple areas
- Report on your team’s progress for project and other key metrics, in addition to presenting detailed and implementable ideas for areas to further improve or influence product or project delivery
- Initiate and support performance evaluation of team members
- Cultivate a culture that motivates all levels of performers to higher levels of achievement
- Build and maintain relationships with your team members to support an environment of trust
- Identify where technical or analytical skill gaps put future team deliverables at risk and craft a plan to remediate, consistently challenge team members to share knowledge and learn new technologies
- Significantly contribute to the team planning process to include surfacing associate level proposals
- Collaborate with the product teams to understand their pain points around performance, resiliency and formulate strategies to address recurring issues in a sustainable way
- Develop and motivate teams to solve complex problems and be a strong advocate for open-source technologies and solutions
- Be responsible for building and mentoring a new team of Site reliability engineers
- Drive the team towards building solutions towards the long-term goals while ensuring that high priority tech debts are solved in an efficient way
- Be a strong thought leader in Observability, Site Reliability engineering Principles
- Consistently share best practices and improve processes within and across teams
Preferred Qualifications
-
No preferred qualifications provided.