About Me

Zhewei Yao is a senior researcher at Microsoft, working on efficient large scale training and inference. He was a Ph.D. student in the BAIR, RISELab (former AMPLab), BDD, and Math Department at University of California at Berkeley. He was advised by Michael Mahoney, and he worked very closely with Kurt Keutzer. His research interest lies in computing statistics, optimization, and machine learning. Currently, he is interested in leveraging tools from randomized linear algebra to provide efficient and scalable solutions for large-scale optimization and learning problems. He is also working on the theory and application of deep learning. Before joining UC Berkeley, he received his B.S. in Math from Zhiyuan Honor College at Shanghai Jiao Tong University. Here is the CV (last update 10/04/2021).

Publications

Conference

  • Hessian-Aware Pruning and Optimal Neural Implant.
    S. Yu*, Z. Yao*, A. Gholami*, Z. Dong*, M. W. Mahoney, K. Keutzer
    arXiv, code
    Accepted for publication, Proc. WACV 2022
  • What’s Hidden in a One-layer Randomly Weighted Transformer?.
    S. Shen*, Z. Yao*, D. Kiela, K. Keutzer, M. W. Mahoney
    arXiv, code
    Accepted for publication, Proc. EMNLP 2021
  • ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training.
    J. Chen, L. Zheng, Z. Yao, D. Wang, I. Stoica, M. W. Mahoney, J. E. Gonzalez
    arXiv
    Accepted for publication, Proc. ICML 2021 (Long Talk).
  • I-BERT: Integer-only BERT Quantization.
    S. Kim*, A. Gholami*, Z. Yao*, M. W. Mahoney, K. Keutzer
    arXiv, code
    Accepted for publication, Proc. ICML 2021 (Long Talk).
  • HAWQ-V3: Dyadic Neural Network Quantization.
    Z. Yao*, Z. Dong*, Z. Zheng*, A. Gholami*, J. Yu, E. Tan, L. Wang, Q. Huang, Y. Wang, M. W. Mahoney, K. Keutzer
    arXiv, code
    Accepted for publication, Proc. ICML 2021 (Short Talk).
  • ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.
    Z. Yao*, A. Gholami*, S. Shen, K. Keutzer, and M. W. Mahoney,
    arXiv, code
    Accepted for publication, Proc. AAAI 2021.
  • A Statistical Framework for Low-bitwidth Training of Deep Neural Networks.
    J. Chen, Y. Gai, Z. Yao, M. W. Mahoney, and J. E. GonZalez
    arXiv, code
    Accepted for publication, Proc. NeurIPS 2020.
  • MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding.
    Q. Wang, H. Tan, S. Shen, M. W. Mahoney, and Z. Yao
    arXiv, code
    Accepted for publication, Proc. EMNLP 2020.
  • PowerNorm: Rethinking Batch Normalization in Transformers.
    S. Shen*, Z. Yao*, A. Gholami, M. W. Mahoney, and K. Keutzer
    arXiv, code
    Accepted for publication, Proc. ICML 2020.
  • ZeroQ: A Novel Zero Shot Quantization Framework.
    Y. Cai*, Z. Yao*, Z. Dong*, A. Gholami, M. W. Mahoney, and K. Keutzer
    arXiv, code
    Accepted for publication, Proc. CVPR 2020.
  • PyHessian: Neural Networks Through the Lens of the Hessian.
    Z. Yao*, A. Gholami*, K. Keutzer, M. W. Mahoney
    arXiv, code
    Accepted for publication, Proc. IEEE BigData 2020.
  • HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.
    Z. Dong, Z. Yao, Y. Cai, D. Arfeen, A. Gholami, M. W. Mahoney, K. Keutzer
    arXiv, code
    Accepted for publication, Proc. NeurIPS 2020.
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.
    S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, M. W. Mahoney, K. Keutzer
    arXiv
    Accepted for publication, Proc. AAAI 2020.
  • ANODEV2: A Coupled Neural ODE Evolution Framework.
    T. Zhang*, Z. Yao*, A. Gholami*, K. Keutzer, J. Gonzalez, G. Biros, and M. W. Mahoney
    arXiv, code
    Accepted for publication, Proc. NeurIPS 2019.
  • HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision.
    Z. Dong*, Z. Yao*, A. Gholami*, M. W. Mahoney, K. Keutzer
    arXiv, code
    Accepted for publication, Proc. ICCV 2019.
  • Inefficiency of K-FAC for Large Batch Size Training.
    L. Ma, G. Montague, J. Ye, Z. Yao, A. Gholami, K. Keutzer, M. W. Mahoney
    arXiv
    Accepted for publication, Proc. AAAI 2020.
  • JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks.
    N. B. Erichson*, Z. Yao*, M. W. Mahoney
    arXiv
    Accepted for publication, Proc. ICPRAM 2020.
  • Trust Region Based Adversarial Attack on Neural Networks.
    Z. Yao, A. Gholami, P. Xu, K. Keutzer, M. W. Mahoney
    arXiv, code
    Accepted for publication, Proc. CVPR 2019.
  • Hessian-based Analysis of Large Batch Training and Robustness to Adversaries.
    Z. Yao*, A. Gholami*, Q. Lei, K. Keutzer, M. W. Mahoney
    arXiv, code
    Accepted for publication, Proc. NIPS 2018.

Journal

  • Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data.
    N. B. Erichson, L. Mathelin, Z. Yao, S. L. Brunton, M. W. Mahoney, J. N. Kutz
    arXiv
    Accepted for publication, Proceedings of the Royal Society A.
  • Inexact non-convex Newton-type methods.
    Z. Yao, P. Xu, F. Roosta-Khorasani, M. W. Mahoney
    arXiv, code
    Accepted for publication, INFORMS Journal on Optimization.
  • A hybrid adaptive MCMC algorithm in function spaces.
    Q. Zhou, Z. Hu, Z. Yao, J. Li
    arXiv
    SIAM/ASA Journal on Uncertainty Quantification 5 (1), 621-639
  • On an adaptive preconditioned Crank–Nicolson MCMC algorithm for infinite dimensional Bayesian inference.
    Z. Hu, Z. Yao, J. Li
    arXiv
    Journal of Computational Physics 332, 492-503
  • A TV-Gaussian prior for infinite-dimensional Bayesian inverse problems and its numerical implementation.
    Z. Yao, Z. Hu, J. Li
    arXiv
    Inverse Problems 32 (7), 075006 (Highlight Paper)

    Book Chapter

  • A Survey of Quantization Methods for Efficient Neural Network Inference.
    A. Gholami*, S. Kim*, Z. Dong*, Z. Yao*, M. W. Mahoney, K. Keutzer
    arXiv
    Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence, 2021.

Workshop

  • Parameter Re-Initialization through Cyclical Batch Scheduling.
    N. Mu*, Z. Yao*, A. Gholami, K. Keutzer, M. W. Mahoney
    arXiv
    Accepted for publication, Proc. MLSYS Workshop at NIPS 2018
  • An Empirical Exploration of Gradient Correlations in Deep Learning.
    D. Rothchild, R. Fox, N. Golmant, J. Gonzalez, M. W. Mahoney, K. Rothauge, I. Stoica and Z. Yao
    Integration of Deep Learning Theories, NeurIPS 2018

Preprint and Technical Report

  • Inexact Newton-CG Algorithms With Complexity Guarantees.
    Z. Yao, P. Xu, F. Roosta, S. J. Wright, M. W. Mahoney
    arXiv
  • How Much Can CLIP Benefit Vision-and-Language Tasks?.
    S. Shen, L. H. Li, H. Tan, M. Bansal, A. Rohrbach, K. Chang, Z. Yao, K Keutzer
    arXiv
  • MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models.
    Z. Yao, L. Ma, S. Shen, K. Keutzer, M. W. Mahoney
    arXiv
  • Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition.
    S. Kim, A. Gholami, Z. Yao, A. Nrusimha, B. Zhai, T. Gao, M. W. Mahoney, K. Keutzer
    arXiv
  • Benchmarking Semi-supervised Federated Learning.
    Z. Zhang*, Z. Yao*, Y. Yang, Y. Yan, J. E. Gonzalez, and M. W. Mahoney
    arXiv, code
  • Residual Networks as Nonlinear Systems: Stability Analysis using Linearization.
    K. Rothauge, Z. Yao, Z. Hu, and M. W. Mahoney
    arXiv
  • On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent.
    N. Golmant, N. Vemuri, Z. Yao, V. Feinberg, A. Gholami, K. Rothauge, M. W. Mahoney, J. Gonzalez
    arXiv
  • Large batch size training of neural networks with adversarial training and second-order information.
    Z. Yao*, A. Gholami*, K. Keutzer, M. W. Mahoney
    arXiv, code

Selected Talks

  • ICML’21 (ICML)
    Onlie (Jul, 2021)
  • SIAM CSE’21: Beyond First Order Methods in Machine Learning Systems (CSE)
    Online (Mar, 2021)
  • AAAI’21 (AAAI)
    Online (Feb, 2021)
  • IEEE BigData’20 (BigData)
    Online (Dec, 2020), slides
  • Berkeley Real-time Intelligent Secure Explanaible Systems Lab Camp (RiseLab)
    Online (Oct, 2020), slides1 and slides2, vedio
  • Fast.AI (Fast.AI)
    Online (Oct, 2020), slides, vedio
  • Scalable Parallel Computing Lab (SPCL)
    Online (Oct, 2020), slides, vedio
  • ICML’20 Workshop on Beyond First-Order Optimization Methods in Machine Learning (Beyond)
    Online (July, 2020), slides, vedio
  • Berkeley Real-time Intelligent Secure Explanaible Systems Lab Sponsor Retreat (RiseLab)
    Tahoe Lake, CA, USA (May, 2020), slides
  • NeurIPS’19 Workshop on Beyond First-Order Optimization Methods in Machine Learning (Beyond)
    Vancouver, Canada (December, 2019)
  • DIMACS Workshop on Randomized Numerical Linear Algebra, Statistics, and Optimization (DIMACS)
    Rutgers University, New Jersey, USA (September, 2019), slides
  • Computer Vision Panel (IJCAI)
    Macau, China (August, 2019), slides
  • Randomized Algorithms for Optimization Problems in Statistics (JSM)
    Colorado Convention Center, Denver, Colorado, USA (July, 2019), slides
  • Berkeley Scientific Computing and Matrix Computations Seminar (Link)
    Berkeley, CA, USA (November, 2018), slides
  • Berkeley Real-time Intelligent Secure Explanaible Systems Lab Sponsor Retreat (RiseLab)
    Tahoe Lake, CA, USA (August, 2018), slides

Teaching