Posted in

Senior Staff Software Engineer – High Performance Inference System

Senior Staff Software Engineer – High Performance Inference System

CompanyGroq
LocationPalo Alto, CA, USA, Toronto, ON, Canada, Remote in USA, Remote in Canada
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Deep expertise in computer architecture, operating systems, algorithms, hardware-software interfaces, and parallel/distributed computing
  • Mastery of system-level programming (C++, Rust, or similar) with emphasis on low-level optimizations and hardware-aware design
  • Excellence in profiling and optimizing systems for latency, throughput, and efficiency
  • Commitment to automated testing and CI/CD pipelines
  • Curiosity about system internals and ability to solve problems across abstraction layers
  • Ability to make pragmatic technical judgments, balancing short-term velocity with long-term system health
  • Ability to write empathetic, maintainable code with strong version control and modular design, prioritizing readability and usability for future teammates

Responsibilities

  • Build and operate real-time, distributed compute frameworks and runtimes to deliver planet-scale inference for LLMs and advanced AI applications at ultra-low latency
  • Develop deterministic, low-overhead hardware abstractions for thousands of synchronously coordinated GroqChips across a software-scheduled interconnection network
  • Prioritize fault tolerance, real-time diagnostics, ultra-low-latency execution, and mission-critical reliability
  • Future-proof Groq’s software stack for next-gen silicon, innovative multi-chip topologies, emerging form factors, and heterogeneous co-processors
  • Foster collaboration across cloud, compiler, infra, data centers, and hardware teams to align engineering efforts, enable seamless integrations, and drive progress toward shared goals
  • Reduce operational overhead, improve SLOs

Preferred Qualifications

  • Experience shipping complex projects in fast-paced environments while maintaining team alignment and stakeholder support
  • Expertise operating large-scale distributed systems for high-traffic internet services
  • Experience deploying and optimizing machine learning (ML) or high-performance computing (HPC) workloads in production
  • Hands-on optimization of performance-critical applications using GPUs, FPGAs, or ASICs (e.g., memory management, kernel optimization)
  • Familiarity with ML frameworks (e.g., PyTorch) and compiler tooling (e.g., MLIR) for AI/ML workflow integration