Service Reliability Analyst II – Itil
Company | Riot Games |
---|---|
Location | Los Angeles, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Junior, Mid Level |
Requirements
- 2-4 years of hands-on experience in IT service management, data analysis, or technical operations, with a focus on maintaining and optimizing IT infrastructure.
- Strong proficiency in incident, problem, change, and release management, with the ability to design and implement process flows using industry-standard methodologies.
- Solid understanding of software development life cycles (SDLC) and how various components interact within larger ecosystems, ensuring seamless operation and scalability.
- Clear awareness of system and service ownership within a multi-team environment, including the effective use of APIs/SDKs and adherence to SLAs.
- Deep enthusiasm for operations and technology, with a proactive approach to continuous improvement in system reliability and performance.
- ITIL-based Ticketing Systems: In-depth experience with ServiceNow, JIRA, or similar platforms for tracking and managing IT service processes.
- Experience with the following tools and technologies: Advanced skills in Tableau, DataWrapper, and Excel for creating actionable insights from complex datasets; Proficient in JQL, SQL, and XQuery for querying and manipulating data across various platforms; Expertise in setting up and managing monitoring frameworks using tools like DataDog and NewRelic to ensure system health and performance; Skilled in Event Correlation to improve Incident Response with tools such as Datadog, Big Panda or PagerDuty.
Responsibilities
- Lead and facilitate weekly technical discussions on service reliability with key product teams, ensuring alignment on operational goals and performance metrics.
- Conduct thorough audits of incident data in collaboration with service owners to validate accuracy and ensure comprehensive reporting and analysis.
- Collect, synthesize, and report on system health metrics for Riot’s diverse infrastructure, utilizing advanced data collection methods and monitoring tools.
- Perform in-depth analysis of operational data trends to identify and address systemic issues and optimize service performance.
- Participate in on-call rotations to provide critical support and ensure rapid response to incidents, minimizing downtime and service disruptions.
- Assist in tracking and coordinating corrective actions for root cause analysis, ensuring thorough resolution of underlying issues and continuous improvement of operational processes.
- Develop and maintain dashboards and reports that provide insights into key operational performance metrics, assisting leaders with making data-driven decisions.
Preferred Qualifications
- 2+ years of specialized experience in Service Reliability Engineering (SRE) or equivalent roles such as Technical Release Manager, Process Owner, Live Operations Engineer, or Network Administrator.
- Bachelor’s degree in Computer Science, IT Systems, Information Technology, or a closely related field, or equivalent professional experience.
- Advanced data analysis and data insights proficiency, with the ability to derive actionable intelligence from large datasets.
- Relevant certifications such as AWS Certified Solutions Architect, CompTIA Linux+, or CompTIA Network+, or equivalent credentials, are highly valued.
- Demonstrated expertise in deploying and managing monitoring solutions such as DataDog and NewRelic to ensure system health and performance within complex environments.