Empowering Large Language Models with Efficient and Automated Systems

Talk
Zhuohan Li
Talk Series: 
Time: 
03.28.2024 13:00 to 14:00

Large Language Models (LLMs) have brought remarkable advancements to the computing industry. However, a high barrier exists between the LLMs and the vast majority of researchers and practitioners, brought by the engineering challenges with the enormous model sizes and the substantial compute requirements. In this talk, I’ll discuss my research on system innovations to democratize LLMs, which includes (1) Alpa and AlpaServe, the first system to automate model-parallel training and accelerate serving with model parallelism, and (2) vLLM, a high-throughput and memory-efficient serving engine for large language models, accelerated with PagedAttention. I will conclude by presenting the short-term research challenges and long-term trends in LLM systems.