Performance engineer
Company | Writer |
---|---|
Location | New York, NY, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Expert or higher |
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s preferred)
- 10+ years of experience in performance engineering, with a focus on large-scale distributed systems
- 2+ years of experience working with AI/ML technologies
- Proven experience in performance testing, profiling, and analysis of complex software systems
- Deep understanding of NLP architectures, training, and inference
- Experience with vector databases and search technologies
- Experience with cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
- Strong programming skills in Python
- Experience with performance analysis tools (e.g., profilers, debuggers, monitoring tools)
Responsibilities
- Define and implement performance engineering strategies for our Generative AI full stack, including services, application, LLMs, RAG pipelines, and related infrastructure
- Lead performance testing, profiling, and analysis efforts to identify and resolve performance bottlenecks
- Establish and maintain performance benchmarks and SLAs for critical AI services
- Provide technical leadership and mentorship to performance engineering team members
- Analyze and improve LLM inference performance, including latency, throughput, and resource utilization
- Develop and implement strategies for LLM capacity planning and scaling
- Collaborate with AI researchers to evaluate and improve LLM model architectures and training techniques for performance
- Optimize LLM inference through techniques such as quantization, distillation, and optimized kernel implementation
- Design and implement performance tests for RAG pipelines, including retrieval, ranking, and generation components
- Identify and optimize performance bottlenecks in RAG systems, such as database queries, vector search, and document processing
- Evaluate and optimize RAG system architectures for scalability and efficiency
- Tune vector databases for optimal recall and latency
- Collaborate with infrastructure teams to optimize hardware and software configurations for AI workloads
- Evaluate and recommend new technologies and tools for performance monitoring and analysis
- Develop and maintain performance dashboards and reports to track key metrics
- Optimize GPU utilization and memory management for LLM inference
- Work closely with AI researchers, software engineers, and product managers to ensure performance requirements are met
- Communicate performance findings and recommendations to stakeholders at all levels
- Stay up-to-date with the latest developments in Generative AI and performance engineering
Preferred Qualifications
-
No preferred qualifications provided.