2025 Spring
Specific Requirements
- We focus on the latest papers from SOSP and OSDI, as well as papers released on arXiv. Each time presenters select one paper from SOSP or OSDI and one from arXiv.
- The presentation follows a "1+N" format, where one person delivers the main content while supporting members assist with preparation and manage the Q&A session. These supporting members are also encouraged to contribute to the presentation.
- The discussion should provide a thorough analysis of the paperβs strengths and weaknesses, along with a comprehensive review of related work from the past three years. The presentation must be at least 45 minutes long.
Other Information
The playback video and text summary will be uploaded to bilibili and zhihu as soon as possible.
Schedule
February 25
- π‘ Kick-off meeting
- πββοΈ Jiyang Wang, Kunzhao Xu and Cheng Li
- π slides
March 11
- π‘ Comprehensive introduction of DeepSeek-AI's technical report (PART β )
- πββοΈ Xin Ren, Tonghuan Xiao, Jiahui Tan, Yandong Shi, Kunzhao Xu, Yifei Liu, Chongzhuo Yang, Jiaan Zhu, Zewen Jin, Yinhe Chen, Ping Gong, Guanbin Xu, Haiquan Wang, Quan Zhou and Chaoyi Ruan
- π MLA slides, π DualPipe slides, π FP8 Training slides, π MTP slides
- π Q&A summary, πΊ video
March 18
Topic β
- π‘ Comprehensive introduction of DeepSeek-AI's technical report (PART β ‘)
- πββοΈ Xin Ren, Tonghuan Xiao, Jiahui Tan, Yandong Shi, Kunzhao Xu, Yifei Liu, Chongzhuo Yang, Jiaan Zhu, Zewen Jin, Yinhe Chen, Ping Gong, Guanbin Xu, Haiquan Wang, Quan Zhou and Chaoyi Ruan
- π RL slides, π 3fs slides
- π Q&A summary, πΊ video
Topic β ‘
- π‘ [OSDI'24] Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
- πββοΈ Chengru Yang
- π slides
- π Q&A summary, πΊ video
March 25
Topic β
- π‘ [OSDI'24] FairyWren: A Sustainable Cache for Emerging Write-Read-Erase Flash Interfaces
- πββοΈ Qingyuan Chen
- π slides
Topic β ‘
- π‘ [arXiv] fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
- πββοΈ Jia He, Jiaqi Ruan
- π slides
Summary and Video
April 1
Topic β
- π‘ [SOSP'24] CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory
- πββοΈ Sen Han
- π slides
Topic β ‘
- π‘ [arXiv] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
- πββοΈ Tonghuan Xiao, Xin Ren
- π slides
Summary and Video
April 8
Topic β
- π‘ [OSDI'25] Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD
- πββοΈ Hengyu Liang
Topic β ‘
- π‘ [arXiv] Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
- πββοΈ Jiawei Yi
- π slides
- π Q&A summary, πΊ video
April 15
- π‘ [arXiv] Mooncake: Trading More Storage for Less Computation β A KVCache-centric Architecture for Serving LLM Chatbot
- πββοΈ Juncheng Zhang
- π slides
- π Q&A summary, πΊ video
April 22
Topic β
- π‘ [OSDI'24] Llumnix: Dynamic Scheduling for Large Language Model Serving
- πββοΈ Kunzhao Xu
- π slides
Topic β ‘
- π‘ [SOSP'24] Enabling Parallelism Hot Switching for Efficient Training of Large Language Models
- πββοΈ Qinghe Wang
- π slides
Summary and Video
April 29
Topic β
- π‘ [SOSP'24] Tiered Memory Management: Access Latency is the Key!
- πββοΈ Lijun Miao
Topic β ‘
- π‘ [arXiv] ByteScale: Efficient Scaling of LLM Training with aΒ 2048K Context Length on More Than 12,000 GPUs
- πββοΈ Long Zhao
Summary and Video
May 6
Topic β
- π‘ [OSDI'25] Fast and Live Model Auto Scaling with O(1) Host Caching
- πββοΈ Chenhan Wang
Topic β ‘
- π‘ [arXiv] Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
- πββοΈ Shiyi Wang
May 13
Topic β
- π‘ [SOSP'24] OZZ: Identifying Kernel Out-of-Order Concurrency Bugs with In-Vivo Memory Access Reordering
- πββοΈ Jiyang Wang
- π slides
Topic β ‘
- π‘ [arXiv] AsyncFS: Metadata Updates Made Asynchronous for Distributed Filesystems with In-Network Coordination
- πββοΈ Chongzhuo Yang
- π slides
Summary and Video
May 20
- π‘ [arXiv] Down with the Hierarchy: The βHβ in HNSW Stands for βHubsβ
- πββοΈ Bosen Yang
- π slides
Summary and Video
May 27
Topic β
- π‘ [OSDI'24] dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
- πββοΈ Chizheng Fang
- π slides
Topic β ‘
- π‘ [arXiv] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
- πββοΈ Yicheng Zhang
- π slides
Summary and Video
Jun 3
Topic β
- π‘ [SOSP24] Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON Cache
- πββοΈ Chao Bi
- π slides
Topic β ‘
- π‘ [arXiv] RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
- πββοΈ Xiaoqi Li
- π slides
Summary and Video
Jun 10
Topic β
- π‘ [SOSP24] LazyLog: A New Shared Log Abstraction for Low-Latency Applications
- πββοΈ Jiaxuan Liu
- π slides
Topic β ‘
- π‘ [arXiv] FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
- πββοΈ Zewen Jin
- π slides
Summary and Video
Jun 17
Topic β
- π‘ [OSDI25] WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training
- πββοΈ Shen Fu
Topic β ‘
-
π‘ [arXiv] Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler
-
πββοΈ Ouxiang Zhou
Summary and Video
Jun 24
Topic β
- π‘ [SOSP24] VPRI: Efficient I/O Page Fault Handling via Software-Hardware Co-Design for IaaS Clouds
- πββοΈ Zheng yang
- π slides
Topic β ‘
-
π‘ [arXiv] StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
-
πββοΈ Muxin Liu