2025 Fall
Specific Requirements
- We focus on the latest papers from SOSP and OSDI, as well as papers released on arXiv. Each time presenters select one paper from SOSP or OSDI and one from arXiv.
- The presentation follows a "1+N" format, where one person delivers the main content while supporting members assist with preparation and manage the Q&A session. These supporting members are also encouraged to contribute to the presentation.
- The discussion should provide a thorough analysis of the paperβs strengths and weaknesses, along with a comprehensive review of related work from the past three years. The presentation must be at least 45 minutes long.
Other Information
The playback video and text summary will be uploaded to bilibili and zhihu as soon as possible.
Schedule
December 30
Topic I
- π‘ [SOSP'25] HedraRAG: Co-Optimizing Generation and Retrieval for Heterogeneous RAG Workflows
- πββοΈ Chao Bi
Topic II
- π‘ [SOSP'25] METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
- πββοΈ Xiaoqi Li
December 23
Topic I
- π‘ [SOSP'25] KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models
- πββοΈ ZhiHao Le
- π slides,πΊ video
Topic II
- π‘ [arXiv] Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads
- πββοΈ Jiaqi Ruan, Jia He
- π slides,πΊ video
December 16
Topic I
- π‘ [OSDI'25] NanoFlow: Towards Optimal Large Language Model Serving Throughput
- πββοΈ Yinhe Chen, Dongqi Tian
- π slides,πΊ video
Topic II
- π‘ [arXiv] IC-Cache: Efficient Large Language Model Serving via In-context Caching
- πββοΈ Sen Han
- π slides,πΊ video
December 2
Topic I
- π‘ [arXiv] dInfer: An Efficient Inference Framework for Diffusion Language Models
- πββοΈ Yuxin Ma (Ant Group)
- π slides,πΊ video
Topic II
- π‘ [arXiv] Kimi Linear: An Expressive, Efficient Attention Architecture
- πββοΈ Ping Gong, Xin Ren
- π slides,πΊ video
November 25
Topic I
- π‘ ROLL: An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
- πββοΈ Wei Gao (Alibaba ROLL)
- π slides,πΊ video
Topic II
- π‘ [SOSP '25] Spirit: Fair Allocation of Interdependent Resources in Remote Memory Systems
- πββοΈ Yicheng Zhang
- π slides,πΊ video
November 18
Topic I
- π‘ [arXiv] HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds
- πββοΈ Jiyang Wang
- π slides,πΊ video
Topic II
- π‘ [SOSP '25] Pie: A Programmable Serving System for Emerging LLM Applications
- πββοΈ Shen Fu, Zewen Jin
- π slides,πΊ video
November 11
- π‘ [arXiv] FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline
- πββοΈ chi zhangοΌJiahao Li
- π slides,πΊ video
November 4
Topic I
- π‘ [OSDI'25] Enabling Efficient GPU Communication over Multiple NICs with FuseLink
- πββοΈ Haiquan Wang, Tonghuan Xiao, Jiahui Tan
- π slides,πΊ video
Topic II
- π‘ [arXiv] Fast-dLLM v2: Efficient Block-Diffusion LLM
- πββοΈ Xiliang Xian
- π slides,πΊ video
October 28
- π‘ [arXiv] Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
- πββοΈ Jiaan Zhu, Qinghe Wang, Long Zhao
- π slides, πΊ video
October 21
- π‘ [arXiv] ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production
- πββοΈ Zijian Dai
- π slides, πΊ video
September 29
- β¨ SOSP Rehearsal
- π‘ Mantle: Efficient Hierarchical Metadata Management for Cloud Object Storage Services
- πββοΈ Jiahao Li
September 16
- π‘ Kick-off meeting
- πββοΈ Youhui Bai, Zhihui Chen, Ouxiang Zhou and Ruibo Liu
- π slides