Xiaonan Nie (聂小楠)

Email: xiaonan.nie [AT] pku.edu.cn

Xiaonan Nie is currently a final-year PhD student under the supervision of Professor Bin Cui at DAIR Lab, Computer Science Department, Peking University.

His research interests primarily lie in the field of Distributed Deep Learning Systems, focusing on Large Language Model (LLM) systems, Multi-Agent Systems, Heterogeneous Computing, and Sparse Neural Networks, specifically Mixture of Experts (MoE). During his PhD, his research has concentrated on Hetu, a highly efficient distributed deep learning system, where he also served as the main developer.

As a research intern at Tencent’s Machine Learning Platform Department (2022-2023), he led the development of Angel-PTM, a large-scale pre-training system designed for trillion-scale models. Additionally, he was a research intern in System Research Group of Microsoft Research Asia (MSRA) during 2021-2022, under the supervision of Dr. Jilong Xue and Dr. Lingxiao Ma.

News

Mar 19th, 2024: Fortunately, I have been invited to give a talk about LLM inference on NVIDIA GTC 2024.
May 13th, 2023 : One paper was accepted by VLDB 2023.
Apr 20th, 2023 : One paper was accepted by IJCAI 2023.
Dec 13th, 2022: One paper was accepted by SIGMOD 2023.
Oct 16th, 2022: One paper was accepted by VLDB 2023.
Oct 11th, 2022: I was invited to present EvoMoE at Google Workshop on Sparsity and Adaptive Computation.
Aug 24th, 2022: We won the Best Scalable Data Science Paper Award of VLDB 2022.
Jan 20th, 2022: We won Outstanding Award & Champion of 2021 CCF BDCI Contest. (1st/33757).

Systems

Hetu: A Highly Efficient Automatic Parallel Distributed Deep Learning System
- 2021 Synced Machine Intelligence TOP-10 Open Source Awards.
- Pop SOTA！List for AI Developers 2021.
- Outstanding Award & Champion of [2021 CCF BDCI Contest]
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
- Supports the training of trillion-level models (e.g., HunYuan-NLP 1T, Top-1 model in CLUE)

Publications

2023

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent. [PDF]
Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, and Bin Cui.
International Conference on Very Large Data Bases.
VLDB 2023, CCF-A.
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement. [PDF]
Xiaonan Nie, Xupeng Miao, Zilong Wang, Jilong Xue, Lingxiao Ma, Zichao Yang, Gang Cao and Bin Cui.
ACM SIGMOD International Conference on Management of Data.
SIGMOD 2023, CCF-A.
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism. [PDF]
Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang and Bin Cui.
International Conference on Very Large Data Bases.
VLDB 2023, CCF-A.
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning. [PDF]
Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie and Bin Cui.
International Joint Conference on Artificial Intelligence.
IJCAI 2023, CCF-A.

2022

TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting. [PDF]
Xiaonan Nie, Xupeng Miao, Zhi Yang and Bin Cui.
IEEE International Conference on Data Engineering.
ICDE 2022, CCF-A.
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate. [PDF]
Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang and Bin Cui.
Google Workshop on Sparsity and Adaptive Computation.
Hetu: A highly efficient automatic parallel distributed deep learning system. [PDF]
Xupeng Miao, Xiaonan Nie, Hailin Zhang, Tong Zhao and Bin Cui.
Science China Information Sciences.
SCIS 2022, CCF-A.
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework. [PDF]
Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao and Bin Cui.
International Conference on Very Large Data Bases.
VLDB 2022, CCF-A, Best Scalable Data Science Paper!
HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training. [PDF]
Xupeng Miao, Yining Shi, Hailin Zhang, Xin Zhang, Xiaonan Nie, Zhi Yang and Bin Cui.
ACM SIGMOD International Conference on Management of Data.
SIGMOD 2022, CCF-A.

2021

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce. [PDF]
Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma and Bin Cui.
ACM SIGMOD International Conference on Management of Data.
SIGMOD 2021, CCF-A.

Awards

The first level of Huawei Top Minds (华为天才少年), 2023.
CCF BDCI Outstanding Award & Champion, 2021.
HUAWEI DIGIX Global AI Challenge, First runner-up of Search rankings in multimodal and multilingual contexts, 2021.
ACM-ICPC Asia Regional Contest, Silver Medal, 2017.
National Scholarship, 2016.

Nie Xiaonan