Senior Performance Engineer II

Bachelor’s or Master’s degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
Extensive knowledge of Linux kernel, hypervisors, and open-source operating systems
5+ years of experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, and NCCL
5+ years developing strategies for managing, monitoring, and analyzing infrastructure, applications and services
Strong proficiency in Go, Python, and/or Ruby
Deep understanding of kernel performance aspects, including scheduling, context switching, and hardware acceleration
Expertise in distributed systems performance, including tracing and debugging methodologies
Demonstrated ability to solve complex problems at scale
Excellent cross-team collaboration and communication skills
Leadership experience in skills development and mentorship
Professional-level written and spoken English with strong presentation abilities

Develop and implement comprehensive performance metrics, analysis tools, and reporting systems
Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, and network stack to devise optimization strategies
Identify system bottlenecks proactively and drive optimizations across the hypervisor software stack
Work cross-functionally to harness new performance capabilities from evolving hardware architectures
Enhance test frameworks and pipelines to ensure robust performance validation
Investigate and resolve virtual machine downtime and performance issues in our production environment
Participate in on-call rotations as needed to support system reliability

Experience with observability platforms such as Splunk, Prometheus, Grafana, Elastic, or Dynatrace
Experience with Chef, AWX, and/or Kubernetes
Familiarity with x86_64 and/or ARM architectures
Successful history of upstreaming Linux kernel patches
In-depth knowledge of at least one Linux subsystem (CPU scheduling, memory management, file system, I/O, etc.)
Experience in developing and deploying ML-based solutions for anomaly detection and dynamic load balancing