Senior Site Reliability Engineer
Company | Royal Bank of Canada |
---|---|
Location | Toronto, ON, Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Proven experience in site reliability engineering, software engineering, technical support, systems operations, and system administration.
- Familiarity with distributed platforms, mainframe systems, and CRM applications.
- Strong analytical skills and the ability to collaborate effectively across teams.
- Passion for ensuring the reliability and performance of critical systems.
- Hands-on experience in a variety of tools and languages, such as: DevOps CI/CD, Dynatrace, Splunk, PagerDuty, ServiceNow.
- Software engineer experience with production class delivery, strong analytical mindset, communication skills, and sense of ownership/drive.
- Intermediate experience in a variety of environments and platforms, such as: Cloud, Distributed, Business workflows and services/APIs, Mainframe – JCL, Cobol, DB2, Linux/ UNIX.
- Emerging technologies experience, such as: Apache Hadoop and its ecosystem, Hive, Spark, Teradata, Microservices, Shell scripting: the ability to read, understand, modify, and write non-trivial UNIX shell scripts is required.
Responsibilities
- Develop, test, deploy, and maintain reliable and scalable systems that meet or exceed the service level objectives for applications within our Digital Operation Technology ECA/CRM Application portfolio.
- Provide ongoing support to ensure system stability and performance.
- Work closely with application development teams and business partners in Agile labs to understand priorities, impacts, and provide accurate, timely reports to management regarding the status of critical systems and processes.
- Proactively identify and address emerging issues, collaborating with development teams to resolve them both in the short and long term.
- Demonstrate rapid response and synthesis skills to handle multiple technical factors quickly and arrive at a workaround or solution within SLA.
- Make recommendations on process improvements and enhance system and support documentation.
- Actively participate in Agile development projects to represent the interests of maintaining production application stability and adhering to production readiness standards.
- Ensure that projects follow RBC standards for security and audit, change management processes, support system integration testing, and support code promotions through various stages to production through automation with excellent performance and reliability.
- Work on IT Risk deliverables: participate in DR exercises, assist in DR planning and design, interact with various auditors, and ensure that applications in our portfolio function according to published RBC IT Risk Standards.
- Maintain technology currency and address vulnerabilities with a keen eye on process improvement and automation capabilities.
- Automate and improve the processes and tools that support the systems, such as monitoring, alerting, logging, testing, etc.
- Manage and oversee the deployment, system upgrades, and configuration changes of the applications in the portfolio.
- Troubleshoot and resolve complex issues related to mainframe job failures, ETL jobs, and data quality.
Preferred Qualifications
- Knowledge of leading software using a cloud-native stack, such as Spring Boot, Spring Cloud, Cloud Foundry and OpenShift.
- Working experience in one or more of: Algorithm design and optimization, Large-scale systems and/or parallel or distributed systems, Web API, MongoDB, Kafka, RDBMS and/or modern scale-out databases.
- Experience with Agile (SCRUM) methodology.