Senior Systems Engineer – Site Reliability
Company | Rocket Companies |
---|---|
Location | Detroit, MI, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Deep knowledge of Linux-based operating systems (Red Hat Enterprise Linux, CentOS, Rocky, Alma, SUSE, Ubuntu, Debian) including configuration and management, security, performance tuning/monitoring, and troubleshooting.
- Proficiency in scripting and automation with Ansible, Bash, or other languages like Python, Terraform or PowerShell.
- Thorough understanding of computer networking (DHCP, DNS, HTTP, TCP/UDP, IPv4/IPv6, OSI model) and load balancing.
- Understands client-server architecture with relation to web/app servers, databases, load balancers, and other infrastructure platforms.
Responsibilities
- Monitor system performance metrics and logs to preemptively identify and resolve issues.
- Engage in real-time troubleshooting of connections between databases and web servers, ensuring high availability and minimal downtime.
- Develop, maintain, and evaluate automation scripts and playbooks for common infrastructure patterns and standards using Terraform, and custom scripts to ensure consistent and repeatable environments.
- Implement and maintain monitoring solutions that provide insights into system health and performance.
- Leverage tools like Dynatrace and Splunk to create actionable alerts and dashboards.
- Regularly assess and tune the performance of Linux and other operating systems, ensuring secure configurations, and managing vulnerabilities to protect against threats.
- Work closely with application development teams and other stakeholders to support and deploy new technologies that align with business goals.
- Champion best practices and contribute to technology strategy discussions.
- Produce high-quality documentation detailing the configuration, operation, and troubleshooting of supported systems and software.
- Share knowledge with team members on best practices.
- Participate in an on-call rotation to ensure that our critical systems function reliably around the clock.
Preferred Qualifications
- Experience working in Amazon Web Services (AWS), Azure, or Google Cloud Platform (GCP).
- Prior experience working in a customer-facing role and/or helpdesk.
- Working knowledge of any of the following: VMware, Redis and/or other caching technologies, Splunk, Azure DevOps, Active Directory, PowerBI, Dynatrace, ServiceNow, or PagerDuty.
- Knowledge of agile development and delivery processes.
- Experience working with a continuous integration/continuous deployment (CI/CD) platform like CircleCI, Jenkins, Octopus Deploy, or GitHub Actions. Alternatively, experience with a job scheduling/orchestration platform like Rundeck or Tidal.
- Working knowledge of other operating systems and platforms like Windows Server.
- Background in deploying and managing container orchestration platforms like Kubernetes/Docker/ECS etc.