Posted in

Senior Performance Engineer II

Senior Performance Engineer II

CompanyDigitalOcean
LocationSan Francisco, CA, USA
Salary$165000 – $210000
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
  • Extensive knowledge of Linux kernel, hypervisors, and open-source operating systems
  • 5+ years of experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, and NCCL
  • 5+ years developing strategies for managing, monitoring, and analyzing infrastructure, applications and services
  • Strong proficiency in Go, Python, and/or Ruby
  • Deep understanding of kernel performance aspects, including scheduling, context switching, and hardware acceleration
  • Expertise in distributed systems performance, including tracing and debugging methodologies
  • Demonstrated ability to solve complex problems at scale
  • Excellent cross-team collaboration and communication skills
  • Leadership experience in skills development and mentorship
  • Professional-level written and spoken English with strong presentation abilities

Responsibilities

  • Develop and implement comprehensive performance metrics, analysis tools, and reporting systems
  • Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
  • Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, and network stack to devise optimization strategies
  • Identify system bottlenecks proactively and drive optimizations across the hypervisor software stack
  • Work cross-functionally to harness new performance capabilities from evolving hardware architectures
  • Enhance test frameworks and pipelines to ensure robust performance validation
  • Investigate and resolve virtual machine downtime and performance issues in our production environment
  • Participate in on-call rotations as needed to support system reliability

Preferred Qualifications

  • Experience with observability platforms such as Splunk, Prometheus, Grafana, Elastic, or Dynatrace
  • Experience with Chef, AWX, and/or Kubernetes
  • Familiarity with x86_64 and/or ARM architectures
  • Successful history of upstreaming Linux kernel patches
  • In-depth knowledge of at least one Linux subsystem (CPU scheduling, memory management, file system, I/O, etc.)
  • Experience in developing and deploying ML-based solutions for anomaly detection and dynamic load balancing