Posted in

Senior Systems Engineer – Site Reliability

Senior Systems Engineer – Site Reliability

CompanyRocket Companies
LocationDetroit, MI, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Deep knowledge of Linux-based operating systems (Red Hat Enterprise Linux, CentOS, Rocky, Alma, SUSE, Ubuntu, Debian) including configuration and management, security, performance tuning/monitoring, and troubleshooting.
  • Proficiency in scripting and automation with Ansible, Bash, or other languages like Python, Terraform or PowerShell.
  • Thorough understanding of computer networking (DHCP, DNS, HTTP, TCP/UDP, IPv4/IPv6, OSI model) and load balancing.
  • Understands client-server architecture with relation to web/app servers, databases, load balancers, and other infrastructure platforms.

Responsibilities

  • Monitor system performance metrics and logs to preemptively identify and resolve issues.
  • Engage in real-time troubleshooting of connections between databases and web servers, ensuring high availability and minimal downtime.
  • Develop, maintain, and evaluate automation scripts and playbooks for common infrastructure patterns and standards using Terraform, and custom scripts to ensure consistent and repeatable environments.
  • Implement and maintain monitoring solutions that provide insights into system health and performance.
  • Leverage tools like Dynatrace and Splunk to create actionable alerts and dashboards.
  • Regularly assess and tune the performance of Linux and other operating systems, ensuring secure configurations, and managing vulnerabilities to protect against threats.
  • Work closely with application development teams and other stakeholders to support and deploy new technologies that align with business goals.
  • Champion best practices and contribute to technology strategy discussions.
  • Produce high-quality documentation detailing the configuration, operation, and troubleshooting of supported systems and software.
  • Share knowledge with team members on best practices.
  • Participate in an on-call rotation to ensure that our critical systems function reliably around the clock.

Preferred Qualifications

  • Experience working in Amazon Web Services (AWS), Azure, or Google Cloud Platform (GCP).
  • Prior experience working in a customer-facing role and/or helpdesk.
  • Working knowledge of any of the following: VMware, Redis and/or other caching technologies, Splunk, Azure DevOps, Active Directory, PowerBI, Dynatrace, ServiceNow, or PagerDuty.
  • Knowledge of agile development and delivery processes.
  • Experience working with a continuous integration/continuous deployment (CI/CD) platform like CircleCI, Jenkins, Octopus Deploy, or GitHub Actions. Alternatively, experience with a job scheduling/orchestration platform like Rundeck or Tidal.
  • Working knowledge of other operating systems and platforms like Windows Server.
  • Background in deploying and managing container orchestration platforms like Kubernetes/Docker/ECS etc.