Site Reliability Engineer

Company	Charles Schwab
Location	Lone Tree, CO, USA, Austin, TX, USA, Westlake, TX, USA, Ann Arbor, MI, USA, Omaha, NE, USA
Salary	$Not Provided – $Not Provided
Type	Full-Time
Degrees
Experience Level	Mid Level, Senior

4+ years of experience with large-scale enterprise system administration, application support or incident handling
4+ years of experience of RHEL Linux administration or Windows server administration
4+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks
4+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc.
4+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes)
2+ years of experience supporting applications on Cloud operations such as GCP and Pivotal Cloud Foundry (PCF)
3+ years of experience using Atlassian tools Jira, Confluence, Bamboo

Practice Site Reliability Engineering mindset and solve problems through automation, instrumentation, and simplicity
Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of resiliency
Identify applications reliability and availability improvements, establish, and build solutions to continue to drive an improved experience
Perform production support, application deployments and provide a rapid response for critical trading applications
Proactively perform system monitoring, and review SLO / SLI Metrics and runbooks
Implement and collaborate on solutions that increase the monitoring and observability of systems at scale
Work with development teams to provide recommendations about system health upgrades and toil reduction
Advocate for Schwab’s Reliability Engineering principles, guidelines, and standards
Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools
Participate in On-Call escalations during Market and off-hours

Experience researching and building dashboards for Grafana and Prometheus
Experience with Google Cloud Anthos and Kubernetes
Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF)
Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD)
Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions
Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services
Strong advocate with excellent written and verbal communication skills