Posted in

Staff AI Performance Architect – Machine Learning Engineering

Staff AI Performance Architect – Machine Learning Engineering

CompanyQualcomm
LocationSanta Clara, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesMaster’s
Experience LevelSenior

Requirements

  • Master’s degree in Computer Science, Engineering, Information Systems, or related field
  • 3+ years Hardware Engineering experience defining architecture of GPUs or accelerators used for training of AI models
  • In-depth knowledge of nVidia/AMD GPU capabilities and architectures
  • Knowledge of LLM architectures and their HW requirements

Responsibilities

  • Understand trends in ML network design through customer engagements and latest academic research and determine how this will affect both SW and HW design
  • Work with customers to determine hardware requirements for AI training systems
  • Analysis of current accelerator and GPU architectures
  • Architect enhancements required for efficient training of AI models
  • Design and architecture of: Flexible Computational Blocks involving a variety of datatypes : floating point, fixed point, microscaling involving a variety of precision : 32/16/8/4/2/1 capable of optimally performing dense and sparse GEMM, GEMV
  • Memory Technology and subsystems that are optimized for a range of requirements: Capacity, Bandwidth, Compute in Memory, Compute near memory
  • Scale-Out and Scale-Up Architectures: Switches, NoCs, Codesign with Communication Collectives
  • Optimized for Power
  • Ability to perform Competitive Analysis
  • Codesign HW with SW/GenAI (LLM) requirements
  • Define performance models to prove effectiveness of architecture proposals
  • Pre-Silicon prediction of performance for various ML training workloads
  • Perform analysis of performance/area/power trade-offs for future HW and SW ML algorithms including impact of SOC components (memory and bus impacts)

Preferred Qualifications

  • Knowledge of computer architecture, digital circuits and hardware simulators
  • Knowledge of communication protocols used in AI systems
  • Knowledge of Network-on-Chip (NoC) designs used in System-on-Chip (SoC) designs
  • Understanding of various memory technologies used in AI systems
  • Experience in modeling hardware and workloads in order to extract performance and power estimates
  • High-level hardware modeling experience preferred
  • Knowledge of AI Training systems such as NVIDIA DGX and NVL72
  • Experience training and finetuning LLMs using distributed training framework such as DeepSpeed, FSDP
  • Knowledge of front-end ML frameworks (i.e.,TensorFlow, PyTorch) used for training of ML models
  • Strong communication skills (written and verbal)
  • Detail-oriented with strong problem-solving, analytical and debugging skills
  • Demonstrated ability to learn, think and adapt in a fast-changing environment
  • Ability to code in C++ and Python
  • Knowledge of a variety of classes of ML models (i.e. CNN, RNN, etc)