2024 Fall

Specific Requirements

We focus on papers published on SOSP and OSDI within 2 years .
We adopt a "1+N" presentation format: one person will present the main content, while the other N members will assist with preparation and handle the Q&A session.
We expect a detailed discussion of the paper's strengths and weaknesses, as well as a comprehensive overview of related works from the past three years. The presentation should last no less than 45 minutes.

Other Information

The playback video and text summary will be uploaded to bilibili and zhihu as soon as possible.

Schedule

September 03

💡 [OSDI'24] Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
🙎‍♂️ Chaoyi Ruan, Kunzhao Xu, Bosen Yang
📕 slides, 📃 Q&A summary, 📺 video

September 10

💡 [SOSP'23] PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
🙎‍♂️ Jiaan Zhu (Andy), Qinghe Wang, Long Zhao
📕 slides, 📃 Q&A summary, 📺 video

September 18

💡 [OSDI'24] Nomad: Non-Exclusive Memory Tiering via Transactional Page Migration
🙎‍♂️ Jiahao Li
📕 slides, 📃 Q&A summary, 📺 video

September 24

💡 [OSDI'24] μSlope: High Compression and Fast Search on Semi-Structured Logs
🙎‍♂️ Yuming Xu, Hengyu Liang
📕 slides, 📃 Q&A summary, 📺 video

October 08

💡 How (and How Not) to Write a Good Systems Paper
🙎‍♂️ Xiaosong Ma (MBZUAI), Kang Chen (THU), Cheng Li (USTC)
📕 slides

October 15

💡 [OSDI'24] InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
🙎‍♂️ Ping Gong, Jiawei Yi, Juncheng Zhang
📕 slides, 📃 Q&A summary, 📺 video

October 22

💡 [OSDI'24] Harvesting Memory-bound CPU Stall Cycles in Software with MSH
🙎‍♂️ Luofan Chen, Jiyang Wang
📕 slides, 📃 Q&A summary, 📺 video

October 29

💡 [OSDI'24] ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
🙎‍♂️ Mingxuan Liu (NPU, Northwestern Polytechnical University)
📕 slides, 📃 Q&A summary, 📺 video

November 05

💡 [SOSP'24] ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation
🙎‍♂️ Wenhao Huang, Yichi Chen, Yanjie Wang (all from TJU, Tianjin University)
📕 slides, 📃 Q&A summary, 📺 video

November 12

💡 [SOSP'24] TENPLEX: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
🙎‍♂️ Zhaolin Duan, Yuhang Zhang, Yihan Wang (all from TJU, Tianjin University)
📕 slides, 📃 Q&A summary, 📺 video

November 19

💡 [SOSP'23] UGache: A Unified GPU Cache for Embedding-based Deep Learning
🙎‍♂️ Zheng Yang, Yicheng Zhang
📕 slides, 📺 video

November 26

💡 [SOSP'24] BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and Performance
🙎‍♂️ Jingze Huo, Qingyuan Chen
📕 slides, 📺 video

December 03

💡 [SOSP'24] LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
🙎‍♂️ Zewen Jin, Hongrui Zhan, Shen Fu
📕 slides, 📃 Q&A summary, 📺 video

December 10

💡 [SOSP'24] PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
🙎‍♂️ Haiquan Wang, Jia He, Jiaqi Ruan
📕 slides, 📃 Q&A summary, 📺 video

December 17

💡 [OSDI'24] Motor: Enabling Multi-Versioning for Distributed Transactions on Disaggregated Memory
🙎‍♂️ Sen Han
📕 slides, 📃 Q&A summary, 📺 video

December 24

💡 [OSDI'24] Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
🙎‍♂️ Yinhe Chen, Dongqi Tian
📕 slides, 📃 Q&A summary, 📺 video

January 7

💡 [OSDI'24] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
🙎‍♂️ Mingxuan Liu (NPU, Northwestern Polytechnical University)
📕 slides,📃 Q&A summary, 📺 video