Posted in

Tech Lead – Research Scientist – Cloud & AI computing – Dpu/Gpu/CPU

Tech Lead – Research Scientist – Cloud & AI computing – Dpu/Gpu/CPU

CompanyByteDance
LocationSan Jose, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesPhD
Experience LevelSenior, Expert or higher

Description

About the Team
The ByteDance DPU team is committed to building the computing infrastructure foundation of ByteDance and Volcano Engine Public Cloud. The team is committed to the architecture, development and cutting-edge research of software and hardware technologies (compute/network/storage) for Cloud & AI computing. The key technologies include virtualization Hypervisor, in-house high-performance userspace network protocol, high-speed interconnection, virtual network switch, high-performance storage, GPU virtualization/scheduling etc.

Responsibilities
– Responsible for the architecture of the next generation kernel/virtualization/container based on DPU and exploration of frontend technologies.
– Responsible for the research and architecture of DPU software and hardware for both CPU-centric and GPU-centric infrastructure.
– Responsible for optimizing the architecture, development, and performance of the monitoring system for CPU, network, storage, kernel in the data center infrastructure.
– Responsible for cutting-edge exploration, architecture, development, and optimization of GPU virtualization and high performance storage & memory systems.
– Responsible for the acceleration architecture for typical workloads in data center infrastructure such as AI/ML, databases, Serverless Computing, Big Data, etc.

Minimum Qualification
– With a research background and formal research training, a PhD is a MUST
– Familiar with Linux kernel, proficient in kernel subsystems, such as memory, KVM, scheduler, Cgroup, network, storage, file system and other modules, and have relevant practical experience;
– Have a deep understanding of Cloud Services & AI applications, including but not limited to Database Systems, Big data, Distributed Storage, Serverless computing, AI/ML inference & training, etc.
– Familiar with data center infrastructure network traffic, understands IDC hardware, core components, and server architecture, and has system-level design experience for end-to-end performance, cost, and stability.
– Familiar with x86 & ARM CPU architecture, familiar with CPU underlying performance tuning, understanding of network and storage protocols. Those who understand PCIE/RDMA/smartNIC technologies are preferred;

Preferred Qualification
– 5 years of experience in software-hardware co-design is preferred, with specific experience in developing distributed computing systems, high-speed interconnection, and distributed storage.
– Familiar with GPU CUDA programming, userspace storage technology SPDK, network technology DPDK. Those who understand virtualization file system virtfs, CXL memory technologies are preferred

Benefits

Not Specified