Senior Staff Software Engineer – High Performance Inference System
Company | Groq |
---|---|
Location | Palo Alto, CA, USA, Toronto, ON, Canada, Remote in USA, Remote in Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Deep expertise in computer architecture, operating systems, algorithms, hardware-software interfaces, and parallel/distributed computing
- Mastery of system-level programming (C++, Rust, or similar) with emphasis on low-level optimizations and hardware-aware design
- Excellence in profiling and optimizing systems for latency, throughput, and efficiency
- Commitment to automated testing and CI/CD pipelines
- Curiosity about system internals and ability to solve problems across abstraction layers
- Ability to make pragmatic technical judgments, balancing short-term velocity with long-term system health
- Ability to write empathetic, maintainable code with strong version control and modular design, prioritizing readability and usability for future teammates
Responsibilities
- Build and operate real-time, distributed compute frameworks and runtimes to deliver planet-scale inference for LLMs and advanced AI applications at ultra-low latency
- Develop deterministic, low-overhead hardware abstractions for thousands of synchronously coordinated GroqChips across a software-scheduled interconnection network
- Prioritize fault tolerance, real-time diagnostics, ultra-low-latency execution, and mission-critical reliability
- Future-proof Groq’s software stack for next-gen silicon, innovative multi-chip topologies, emerging form factors, and heterogeneous co-processors
- Foster collaboration across cloud, compiler, infra, data centers, and hardware teams to align engineering efforts, enable seamless integrations, and drive progress toward shared goals
- Reduce operational overhead, improve SLOs
Preferred Qualifications
- Experience shipping complex projects in fast-paced environments while maintaining team alignment and stakeholder support
- Expertise operating large-scale distributed systems for high-traffic internet services
- Experience deploying and optimizing machine learning (ML) or high-performance computing (HPC) workloads in production
- Hands-on optimization of performance-critical applications using GPUs, FPGAs, or ASICs (e.g., memory management, kernel optimization)
- Familiarity with ML frameworks (e.g., PyTorch) and compiler tooling (e.g., MLIR) for AI/ML workflow integration