Staff AI Performance Architect – Machine Learning Engineering
Company | Qualcomm |
---|---|
Location | Santa Clara, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Master’s |
Experience Level | Senior |
Requirements
- Master’s degree in Computer Science, Engineering, Information Systems, or related field
- 3+ years Hardware Engineering experience defining architecture of GPUs or accelerators used for training of AI models
- In-depth knowledge of nVidia/AMD GPU capabilities and architectures
- Knowledge of LLM architectures and their HW requirements
Responsibilities
- Understand trends in ML network design through customer engagements and latest academic research and determine how this will affect both SW and HW design
- Work with customers to determine hardware requirements for AI training systems
- Analysis of current accelerator and GPU architectures
- Architect enhancements required for efficient training of AI models
- Design and architecture of: Flexible Computational Blocks involving a variety of datatypes : floating point, fixed point, microscaling involving a variety of precision : 32/16/8/4/2/1 capable of optimally performing dense and sparse GEMM, GEMV
- Memory Technology and subsystems that are optimized for a range of requirements: Capacity, Bandwidth, Compute in Memory, Compute near memory
- Scale-Out and Scale-Up Architectures: Switches, NoCs, Codesign with Communication Collectives
- Optimized for Power
- Ability to perform Competitive Analysis
- Codesign HW with SW/GenAI (LLM) requirements
- Define performance models to prove effectiveness of architecture proposals
- Pre-Silicon prediction of performance for various ML training workloads
- Perform analysis of performance/area/power trade-offs for future HW and SW ML algorithms including impact of SOC components (memory and bus impacts)
Preferred Qualifications
- Knowledge of computer architecture, digital circuits and hardware simulators
- Knowledge of communication protocols used in AI systems
- Knowledge of Network-on-Chip (NoC) designs used in System-on-Chip (SoC) designs
- Understanding of various memory technologies used in AI systems
- Experience in modeling hardware and workloads in order to extract performance and power estimates
- High-level hardware modeling experience preferred
- Knowledge of AI Training systems such as NVIDIA DGX and NVL72
- Experience training and finetuning LLMs using distributed training framework such as DeepSpeed, FSDP
- Knowledge of front-end ML frameworks (i.e.,TensorFlow, PyTorch) used for training of ML models
- Strong communication skills (written and verbal)
- Detail-oriented with strong problem-solving, analytical and debugging skills
- Demonstrated ability to learn, think and adapt in a fast-changing environment
- Ability to code in C++ and Python
- Knowledge of a variety of classes of ML models (i.e. CNN, RNN, etc)