2025 Spring
Specific Requirements
- We focus on the latest papers from SOSP and OSDI, as well as papers released on arXiv. Each time presenters select one paper from SOSP or OSDI and one from arXiv.
- The presentation follows a "1+N" format, where one person delivers the main content while supporting members assist with preparation and manage the Q&A session. These supporting members are also encouraged to contribute to the presentation.
- The discussion should provide a thorough analysis of the paperβs strengths and weaknesses, along with a comprehensive review of related work from the past three years. The presentation must be at least 45 minutes long.
Other Information
The playback video and text summary will be uploaded to bilibili and zhihu as soon as possible.
Schedule
February 25
- π‘ Kick-off meeting
- πββοΈ Jiyang Wang, Kunzhao Xu and Cheng Li
- π slides
March 11
- π‘ Comprehensive introduction of DeepSeek-AI's technical report (PART β )
- πββοΈ Xin Ren, Tonghuan Xiao, Jiahui Tan, Yandong Shi, Kunzhao Xu, Yifei Liu, Chongzhuo Yang, Jiaan Zhu, Zewen Jin, Yinhe Chen, Ping Gong, Guanbin Xu, Haiquan Wang, Quan Zhou and Chaoyi Ruan
- π MLA slides, π DualPipe slides, π FP8 Training slides, π MTP slides
- π Q&A summary, πΊ video
March 18
Topic β
- π‘ Comprehensive introduction of DeepSeek-AI's technical report (PART β ‘)
- πββοΈ Xin Ren, Tonghuan Xiao, Jiahui Tan, Yandong Shi, Kunzhao Xu, Yifei Liu, Chongzhuo Yang, Jiaan Zhu, Zewen Jin, Yinhe Chen, Ping Gong, Guanbin Xu, Haiquan Wang, Quan Zhou and Chaoyi Ruan
- π RL slides, π 3fs slides
- π Q&A summary, πΊ video
Topic β ‘
- π‘ [OSDI'24] Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
- πββοΈ Chengru Yang
- π slides
- π Q&A summary, πΊ video
March 25
Topic β
- π‘ [OSDI'24] FairyWren: A Sustainable Cache for Emerging Write-Read-Erase Flash Interfaces
- πββοΈ Qingyuan Chen
- π slides
Topic β ‘
- π‘ [arXiv] fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
- πββοΈ Jia He, Jiaqi Ruan
- π slides
Summary and Video
April 1
Topic β
- π‘ [SOSP'24] CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory
- πββοΈ Sen Han
- π slides
Topic β ‘
- π‘ [arXiv] Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
- πββοΈ Tonghuan Xiao, Xin Ren
- π slides
Summary and Video
April 8
Topic β
- π‘ [OSDI'25] Achieving Low-Latency Graph-Based Vector Search via Aligning Best-First Search Algorithm with SSD
- πββοΈ Hengyu Liang
Topic β ‘
- π‘ [arXiv] Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
- πββοΈ Jiawei Yi
- π slides
- π Q&A summary, πΊ video
April 15
- π‘ [arXiv] Mooncake: Trading More Storage for Less Computation β A KVCache-centric Architecture for Serving LLM Chatbot
- πββοΈ Juncheng Zhang
- π slides
- π Q&A summary, πΊ video
April 22
Topic β
- π‘ [OSDI'24] Llumnix: Dynamic Scheduling for Large Language Model Serving
- πββοΈ Kunzhao Xu
- π slides
Topic β ‘
- π‘ [SOSP'24] Enabling Parallelism Hot Switching for Efficient Training of Large Language Models
- πββοΈ Qinghe Wang
- π slides
Summary and Video
April 29
Topic β
- π‘ [SOSP'24] Tiered Memory Management: Access Latency is the Key!
- πββοΈ Lijun Miao
Topic β ‘
- π‘ [arXiv] ByteScale: Efficient Scaling of LLM Training with aΒ 2048K Context Length on More Than 12,000 GPUs
- πββοΈ Long Zhao
Summary and Video
May 6
Topic β
- π‘ [OSDI'25] Fast and Live Model Auto Scaling with O(1) Host Caching
- πββοΈ Chenhan Wang
Topic β ‘
- π‘ [arXiv] Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
- πββοΈ Shiyi Wang
May 13
Topic β
- π‘ [SOSP'24] OZZ: Identifying Kernel Out-of-Order Concurrency Bugs with In-Vivo Memory Access Reordering
- πββοΈ Jiyang Wang
Topic β ‘
- π‘ [arXiv] AsyncFS: Metadata Updates Made Asynchronous for Distributed Filesystems with In-Network Coordination
- πββοΈ Chongzhuo Yang
May 20
- π‘ [arXiv] Down with the Hierarchy: The βHβ in HNSW Stands for βHubsβ
- πββοΈ Bosen Yang
May 27
Topic β
- π‘ [OSDI'24] dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
- πββοΈ Chizheng Fang
Topic β ‘
- π‘ [arXiv] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
- πββοΈ Yicheng Zhang