2025 Fall

Specific Requirements

We focus on the latest papers from SOSP and OSDI, as well as papers released on arXiv. Each time presenters select one paper from SOSP or OSDI and one from arXiv.
The presentation follows a "1+N" format, where one person delivers the main content while supporting members assist with preparation and manage the Q&A session. These supporting members are also encouraged to contribute to the presentation.
The discussion should provide a thorough analysis of the paper’s strengths and weaknesses, along with a comprehensive review of related work from the past three years. The presentation must be at least 45 minutes long.

Other Information

The playback video and text summary will be uploaded to bilibili and zhihu as soon as possible.

Schedule

January 20

Topic I

💡 [SOSP'25] DiffKV: Differentiated Memory Management for Large Language Models with Parallel KV Compaction
🙎‍♂️ Chengru Yang, Jiawei Yi
📕 slides, 📺 video

Topic II

💡 [SOSP'25] LithOS: An Operating System for Efficient Machine Learning on GPUs
🙎‍♂️ Qingyuan Chen
📕 slides, 📺 video

January 13

Topic I

💡 [arXiv] REFRAG: Rethinking RAG based Decoding
🙎‍♂️ Bosen Yang
📕 slides, 📺 video

Topic II

💡 [SOSP'25] TrainVerify: Equivalence-Based Verification for Distributed LLM Training
🙎‍♂️ Chenhan Wang, Luofan Chen
📺 video

January 6

Topic I

💡 [SOSP'25] Jenga: Effective Memory Management for Serving LLM with Heterogeneity
🙎‍♂️ Mingxuan Liu (Northwestern Polytechnical University)
📕 slides, 📺 video

Topic II

💡 [arXiv] Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
🙎‍♂️ Mingxuan Liu (Northwestern Polytechnical University)
📕 slides, 📺 video

December 30

Topic I

💡 [SOSP'25] HedraRAG: Co-Optimizing Generation and Retrieval for Heterogeneous RAG Workflows
🙎‍♂️ Chao Bi
📕 slides, 📺 video

Topic II

💡 [SOSP'25] METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
🙎‍♂️ Xiaoqi Li
📕 slides, 📺 video

December 23

Topic I

💡 [SOSP'25] KTransformers: Unleashing the Full Potential of CPU/GPU Hybrid Inference for MoE Models
🙎‍♂️ ZhiHao Le
📕 slides, 📺 video

Topic II

💡 [arXiv] Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads
🙎‍♂️ Jiaqi Ruan, Jia He
📕 slides, 📺 video

December 16

Topic I

💡 [OSDI'25] NanoFlow: Towards Optimal Large Language Model Serving Throughput
🙎‍♂️ Yinhe Chen, Dongqi Tian
📕 slides, 📺 video

Topic II

💡 [arXiv] IC-Cache: Efficient Large Language Model Serving via In-context Caching
🙎‍♂️ Sen Han
📕 slides, 📺 video

December 2

Topic I

💡 [arXiv] dInfer: An Efficient Inference Framework for Diffusion Language Models
🙎‍♂️ Yuxin Ma (Ant Group)
📕 slides, 📺 video

Topic II

💡 [arXiv] Kimi Linear: An Expressive, Efficient Attention Architecture
🙎‍♂️ Ping Gong, Xin Ren
📕 slides, 📺 video

November 25

Topic I

💡 ROLL: An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
🙎‍♂️ Wei Gao (Alibaba ROLL)
📕 slides, 📺 video

Topic II

💡 [SOSP'25] Spirit: Fair Allocation of Interdependent Resources in Remote Memory Systems
🙎‍♂️ Yicheng Zhang
📕 slides, 📺 video

November 18

Topic I

💡 [arXiv] HydraServe: Minimizing Cold Start Latency for Serverless LLM Serving in Public Clouds
🙎‍♂️ Jiyang Wang
📕 slides, 📺 video

Topic II

💡 [SOSP'25] Pie: A Programmable Serving System for Emerging LLM Applications
🙎‍♂️ Shen Fu, Zewen Jin
📕 slides, 📺 video

November 11

💡 [arXiv] FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline
🙎‍♂️ chi zhang，Jiahao Li
📕 slides, 📺 video

November 4

Topic I

💡 [OSDI'25] Enabling Efficient GPU Communication over Multiple NICs with FuseLink
🙎‍♂️ Haiquan Wang, Tonghuan Xiao, Jiahui Tan
📕 slides, 📺 video

Topic II

💡 [arXiv] Fast-dLLM v2: Efficient Block-Diffusion LLM
🙎‍♂️ Xiliang Xian
📕 slides, 📺 video

October 28

💡 [arXiv] Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
🙎‍♂️ Jiaan Zhu, Qinghe Wang, Long Zhao
📕 slides, 📺 video

October 21

💡 [arXiv] ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production
🙎‍♂️ Zijian Dai
📕 slides, 📺 video

September 29

✨ SOSP Rehearsal
💡 [SOSP'25] Mantle: Efficient Hierarchical Metadata Management for Cloud Object Storage Services
🙎‍♂️ Jiahao Li

September 16

💡 Kick-off meeting
🙎‍♂️ Youhui Bai, Zhihui Chen, Ouxiang Zhou and Ruibo Liu
📕 slides