# About Me

Zhewei Yao is a Ph.D. student in the BAIR, RISELab (former AMPLab), BDD and Math Department at University of California at Berkeley. He is advised by Michael Mahoney, and he is also working very closely with Kurt Keutzer. His research interest lies in computing statistics, optimization, and machine learning. Currently, he is interested in leveraging tools from randomized linear algebra to provide efficient and scalable solutions for large-scale optimization and learning problems. He is also working on the theory and application of deep learning. Before joining UC Berkeley, he received his B.S. in Math from Zhiyuan Honor College at Shanghai Jiao Tong University. CV

# Publications

### Papers

- ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.

Z. Yao^{*}, A. Gholami^{*}, S. Shen, K. Keutzer, and M. W. Mahoney,

arXiv, code - Rethinking Batch Normalization in Transformers.

S. Shen^{*}, Z. Yao^{*}, A. Gholami, M. W. Mahoney, and K. Keutzer

arXiv

Accepted for publication, Proc. ICML 2020. - ZeroQ: A Novel Zero Shot Quantization Framework.

Y. Cai^{*}, Z. Yao^{*}, Z. Dong^{*}, A. Gholami, M. W. Mahoney, and K. Keutzer

arXiv, code

Accepted for publication, Proc. CVPR 2020. - PyHessian: Neural Networks Through the Lens of the Hessian.

Z. Yao^{*}, A. Gholami^{*}, K. Keutzer, M. W. Mahoney

arXiv, code - HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.

Z. Dong, Z. Yao, Y. Cai, D. Arfeen, A. Gholami, M. W. Mahoney, K. Keutzer

arXiv

A short version was accepted as a spotlight paper at NuerIPS’19 workshop on Beyond First-Order Optimization Methods in Machine Learning. - Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.

S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, M. W. Mahoney, K. Keutzer

arXiv

Accepted for publication, Proc. AAAI 2020. - ANODEV2: A Coupled Neural ODE Evolution Framework.

T. Zhang^{*}, Z. Yao^{*}, A. Gholami^{*}, K. Keutzer, J. Gonzalez, G. Biros, and M. W. Mahoney

arXiv, code

Accepted for publication, Proc. NeurIPS 2019. - Residual Networks as Nonlinear Systems: Stability Analysis using Linearization.

K. Rothauge, Z. Yao, Z. Hu, and M. W. Mahoney

arXiv - HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision.

Z. Dong^{*}, Z. Yao^{*}, A. Gholami^{*}, M. W. Mahoney, K. Keutzer

arXiv

Accepted for publication, Proc. ICCV 2019. - Inefficiency of K-FAC for Large Batch Size Training.

L. Ma, G. Montague, J. Ye, Z. Yao, A. Gholami, K. Keutzer, M. W. Mahoney

arXiv

Accepted for publication, Proc. AAAI 2020. - Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data.

N. B. Erichson, L. Mathelin, Z. Yao, S. L. Brunton, M. W. Mahoney, J. N. Kutz

arXiv

Accepted for publication, Proceedings of the Royal Society A. - JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks.

N. B. Erichson^{*}, Z. Yao^{*}, M. W. Mahoney

arXiv

Accepted for publication, Proc. ICPRAM 2020. - Trust Region Based Adversarial Attack on Neural Networks.

Z. Yao, A. Gholami, P. Xu, K. Keutzer, M. W. Mahoney

arXiv, code

Accepted for publication, Proc. CVPR 2019. - Parameter Re-Initialization through Cyclical Batch Scheduling.

N. Mu^{*}, Z. Yao^{*}, A. Gholami, K. Keutzer, M. W. Mahoney

arXiv

Accepted for publication, Proc. MLSYS Workshop at NIPS 2018 - On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent.

N. Golmant, N. Vemuri, Z. Yao, V. Feinberg, A. Gholami, K. Rothauge, M. W. Mahoney, J. Gonzalez

arXiv - Large batch size training of neural networks with adversarial training and second-order information.

Z. Yao^{*}, A. Gholami^{*}, K. Keutzer, M. W. Mahoney

arXiv, code - Hessian-based Analysis of Large Batch Training and Robustness to Adversaries.

Z. Yao^{*}, A. Gholami^{*}, Q. Lei, K. Keutzer, M. W. Mahoney

arXiv, code

Accepted for publication, Proc. NIPS 2018. - Inexact non-convex Newton-type methods.

Z. Yao, P. Xu, F. Roosta-Khorasani, M. W. Mahoney

arXiv - A hybrid adaptive MCMC algorithm in function spaces.

Q. Zhou, Z. Hu, Z. Yao, J. Li

arXiv

SIAM/ASA Journal on Uncertainty Quantification 5 (1), 621-639 - On an adaptive preconditioned Crank–Nicolson MCMC algorithm for infinite dimensional Bayesian inference.

Z. Hu, Z. Yao, J. Li

arXiv

Journal of Computational Physics 332, 492-503 - A TV-Gaussian prior for infinite-dimensional Bayesian inverse problems and its numerical implementation.

Z. Yao, Z. Hu, J. Li

arXiv

Inverse Problems 32 (7), 075006

### Workshops

- An Empirical Exploration of Gradient Correlations in Deep Learning.

D. Rothchild, R. Fox, N. Golmant, J. Gonzalez, M. W. Mahoney, K. Rothauge, I. Stoica and Z. Yao

Integration of Deep Learning Theories, NeurIPS 2018

# Talks

- NeurIPS’19 Workshop on Beyond First-Order Optimization Methods in Machine Learning (Beyond)

Vancouver, Canada (December, 2019) - DIMACS Workshop on Randomized Numerical Linear Algebra, Statistics, and Optimization (DIMACS)

Rutgers University, New Jersey, USA (September, 2019) - Computer Vision Panel (IJCAI)

Macau, China (August, 2019) - Randomized Algorithms for Optimization Problems in Statistics (JSM)

Colorado Convention Center, Denver, Colorado, USA (July, 2019) - Berkeley Scientific Computing and Matrix Computations Seminar (Link)

Berkeley, CA, USA (November, 2018) - Berkeley Real-time Intelligent Secure Explanaible Systems Lab Sponsor Retreat (RiseLab)

Tahoe Lake, CA, USA (August, 2018)