Xiaonan Nie (聂小楠)
Xiaonan is currently a research scientist specializing in ML systems at ByteDance Inc., within the TopSeed Program. His work primarily focuses on scaling and optimizing the training of deep learning models. He received his Ph.D. degree in Computer Science from Peking University in 2024, under the supervision of Prof. Bin Cui.
In terms of academic research, Xiaonan has published over 10 papers in top conferences and journals. He won the Best Scalable Data Science Award at VLDB 2022, and was invited to present his research on MoE training at the 1st Google MoE workshop and his research on LLM inference at NVIDIA’s GPU technology conference (GTC) 2024.
In terms of system design and implementation, Xiaonan was the principal developer of HETU, a distributed DL system of Peking University; the technical lead for Angel-PTM of Tencent in 2022-2023, an industrial-level LLM training system for trillion-scale models. Additionally, he lead the MLSys group of Baichuan-AI (a GenAI start-up) in 2023-2024, where he takes responsibility for the rapid training of the Baichuan series models.
Email: xiaonan.nie [AT] pku.edu.cn, niexiaonan [AT] bytedance.com
Research Interests:
- Efficient Video Generation Model(e.g., SoRA)
- Algorithm-System Co-Design
- Machine Learning System
- Data Management
🔍🔍🔍 I am seeking highly motivated full-time employees and research interns. If interested, please contact me directly!
What’s New:
- Sep, 2024: One paper was accepted by NeurIPS 2024.
- Sep, 2024: One paper was accepted by SIGMOD 2024.
- Aug, 2024: One paper was accepted by SOSP 2024.
- Jun, 2024: I was awarded as Outstanding Graduate of Peking University.
- May, 2024: I successfully defended my Ph.D. dissertation!
- Mar, 2024: I was invited to present my research on NVIDIA GTC 2024.
- Feb, 2024: One paper was accepted by TKDE 2024.
Systems:
- Hetu: A Highly Efficient Automatic Parallel Distributed Deep Learning System
- 2021 Synced Machine Intelligence TOP-10 Open Source Awards.
- Pop SOTA!List for AI Developers 2021.
- Outstanding Award & Champion of [2021 CCF BDCI Contest]
- Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
- Supports the training of trillion-level models (e.g., HunYuan-NLP 1T, Top-1 model in CLUE)
Publications:
2024
- [Preprint] Ran Yan, Youhe Jiang, Wangcheng Tao, Xiaonan Nie, Bin Cui, Binhang Yuan. “FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment”.
- [NeurIPS] Xiaonan Nie, Qibin Liu, Fangcheng Fu, Shenhan Zhu, Xupeng Miao, Xiaoyang Li, Yang Zhang, Shouda Liu, Bin Cui. “LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing.”.
- [SIGMOD] Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui. “Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs.”.
- [SOSP] Hao Ge, Fangcheng Fu, Haoyang Li, Xuanyu Wang and Sheng Lin, Yujie Wang, Xiaonan Nie, Hailin Zhang, Xupeng Miao, Bin Cui. “Enabling Parallelism Hot Switching for Efficient Training of Large Language Models”.
- [TKDE] Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui. “Improving Automatic Parallel Training via Balanced Memory Workload Optimization”.
2023
- [Preprint] Baichuan Inc. “Baichuan 2: Open large-scale language models”, arXiv 2023.
- [SIGMOD] Xiaonan Nie, Xupeng Miao, Zilong Wang, Jilong Xue, Lingxiao Ma, Zichao Yang, Gang Cao and Bin Cui. “FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement”.
- [VLDB] Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, and Bin Cui. “Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent”.
- [VLDB] Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang and Bin Cui. “Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism”.
- [IJCAI] Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie and Bin Cui. “OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning”.
2022
[ICDE] Xiaonan Nie, Xupeng Miao, Zhi Yang and Bin Cui. “TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting”.
- [1st Google MoE Workshop] Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang and Bin Cui. “EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate”.
[SCIS] Xupeng Miao, Xiaonan Nie, Hailin Zhang, Tong Zhao and Bin Cui. “Hetu: A highly efficient automatic parallel distributed deep learning system”.
- [VLDB] Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao and Bin Cui. “HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework”. Best Scalable Data Science Paper!
- [SIGMOD] Xupeng Miao, Yining Shi, Hailin Zhang, Xin Zhang, Xiaonan Nie, Zhi Yang and Bin Cui. “HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training”.
2021
- [SIGMOD] Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma and Bin Cui. “Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce”.
Awards
- Outstanding Graduate of Peking University, 2024.
- The 1st level of Huawei Top Minds, 2023.
- Ubiquant Scholarship, 2023.
- Schlumberger Scholarship, 2022.
- CCF BDCI Outstanding Award & Champion, 2021.
- HUAWEI DIGIX Global AI Challenge, First runner-up of Search rankings in multimodal and multilingual contexts, 2021.
- ACM-ICPC Asia Regional Contest, Silver Medal, 2017.
- National Scholarship, 2016.
Academic Services
- Program Committee of ICDE, WWW
- Reviewer of ICLR