Posted in

Performance engineer

Performance engineer

CompanyWriter
LocationNew York, NY, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelExpert or higher

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s preferred)
  • 10+ years of experience in performance engineering, with a focus on large-scale distributed systems
  • 2+ years of experience working with AI/ML technologies
  • Proven experience in performance testing, profiling, and analysis of complex software systems
  • Deep understanding of NLP architectures, training, and inference
  • Experience with vector databases and search technologies
  • Experience with cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
  • Strong programming skills in Python
  • Experience with performance analysis tools (e.g., profilers, debuggers, monitoring tools)

Responsibilities

  • Define and implement performance engineering strategies for our Generative AI full stack, including services, application, LLMs, RAG pipelines, and related infrastructure
  • Lead performance testing, profiling, and analysis efforts to identify and resolve performance bottlenecks
  • Establish and maintain performance benchmarks and SLAs for critical AI services
  • Provide technical leadership and mentorship to performance engineering team members
  • Analyze and improve LLM inference performance, including latency, throughput, and resource utilization
  • Develop and implement strategies for LLM capacity planning and scaling
  • Collaborate with AI researchers to evaluate and improve LLM model architectures and training techniques for performance
  • Optimize LLM inference through techniques such as quantization, distillation, and optimized kernel implementation
  • Design and implement performance tests for RAG pipelines, including retrieval, ranking, and generation components
  • Identify and optimize performance bottlenecks in RAG systems, such as database queries, vector search, and document processing
  • Evaluate and optimize RAG system architectures for scalability and efficiency
  • Tune vector databases for optimal recall and latency
  • Collaborate with infrastructure teams to optimize hardware and software configurations for AI workloads
  • Evaluate and recommend new technologies and tools for performance monitoring and analysis
  • Develop and maintain performance dashboards and reports to track key metrics
  • Optimize GPU utilization and memory management for LLM inference
  • Work closely with AI researchers, software engineers, and product managers to ensure performance requirements are met
  • Communicate performance findings and recommendations to stakeholders at all levels
  • Stay up-to-date with the latest developments in Generative AI and performance engineering

Preferred Qualifications

    No preferred qualifications provided.