About Me

Zhewei Yao is a principal researcher and R&D manager at Microsoft, working on efficient large scale training and inference. He obtained his Ph.D. degree from University of California at Berkeley, where he was a Ph.D. researcher in BAIR, RISELab (former AMPLab), BDD, and Math Department. He was advised by Michael Mahoney, and he worked very closely with Kurt Keutzer. His research interest lies in computing statistics, optimization, and machine learning. Currently, he is interested in leveraging tools from randomized linear algebra to provide efficient and scalable solutions for large-scale optimization and learning problems. He is also working on the theory and application of deep learning. Before joining UC Berkeley, he received his B.S. in Math from Zhiyuan Honor College at Shanghai Jiao Tong University (last update 8/23/2023).

Publications (See full list at Google Scholar)

  • ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
    Z. Yao, R.Y. Aminabadi, M. Zhang, X. Wu, C. Li, Y. He
    arXiv
    Accepted for publication, Proc. NeurIPS 2022
  • Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
    X. Wu*, Z. Yao*, M. Zhang*, C. Li, Y. He
    arXiv
    Accepted for publication, Proc. NeurIPS 2022
  • DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
    S. Rajbhandari, C. Li, Z. Yao, M. Zhang, R. Y. Aminabadi, A. A. Awan, J. Rasley, Y. He
    arXiv
    Accepted for publication, Proc. ICML 2022

Selected Talks

  • ICCV’23 (LBQNN)
    Online (Oct, 2023), slides
  • ICML’21 (ICML)
    Online (Jul, 2021)
  • SIAM CSE’21: Beyond First Order Methods in Machine Learning Systems (CSE)
    Online (Mar, 2021)
  • AAAI’21 (AAAI)
    Online (Feb, 2021)
  • IEEE BigData’20 (BigData)
    Online (Dec, 2020), slides
  • Berkeley Real-time Intelligent Secure Explanaible Systems Lab Camp (RiseLab)
    Online (Oct, 2020), slides1 and slides2, vedio
  • Fast.AI (Fast.AI)
    Online (Oct, 2020), slides, vedio
  • Scalable Parallel Computing Lab (SPCL)
    Online (Oct, 2020), slides, vedio
  • ICML’20 Workshop on Beyond First-Order Optimization Methods in Machine Learning (Beyond)
    Online (July, 2020), slides, vedio
  • Berkeley Real-time Intelligent Secure Explanaible Systems Lab Sponsor Retreat (RiseLab)
    Tahoe Lake, CA, USA (May, 2020), slides
  • NeurIPS’19 Workshop on Beyond First-Order Optimization Methods in Machine Learning (Beyond)
    Vancouver, Canada (December, 2019)
  • DIMACS Workshop on Randomized Numerical Linear Algebra, Statistics, and Optimization (DIMACS)
    Rutgers University, New Jersey, USA (September, 2019), slides
  • Computer Vision Panel (IJCAI)
    Macau, China (August, 2019), slides
  • Randomized Algorithms for Optimization Problems in Statistics (JSM)
    Colorado Convention Center, Denver, Colorado, USA (July, 2019), slides
  • Berkeley Scientific Computing and Matrix Computations Seminar (Link)
    Berkeley, CA, USA (November, 2018), slides
  • Berkeley Real-time Intelligent Secure Explanaible Systems Lab Sponsor Retreat (RiseLab)
    Tahoe Lake, CA, USA (August, 2018), slides

Teaching