1 paper accepted to AISTATS

Our paper on convergence of gradient descent on overparametrized linear network [1] has been accepted to International Conference on Artificial Intelligence and Statistics!

[1] Z. Xu, H. Min, S. Tarmoun, E. Mallada, and R. Vidal, “Linear Convergence of Gradient Descent For Overparametrized Finite Width Two-Layer Linear Networks with General Initialization,” in International Conference on Artificial Intelligence and Statistics (AISTATS), 2023, pp. 2262-2284.
[Bibtex] [Abstract] [Download PDF]

Recent theoretical analyses of the convergence of gradient descent (GD) to a global minimum for over-parametrized neural networks make strong assumptions on the step size (infinitesimal), the hidden-layer width (infinite), or the initialization (spectral, balanced). In this work, we relax these assumptions and derive a linear convergence rate for two-layer linear networks trained using GD on the squared loss in the case of finite step size, finite width and general initialization. Despite the generality of our analysis, our rate estimates are significantly tighter than those of prior work. Moreover, we provide a time-varying step size rule that monotonically improves the convergence rate as the loss function decreases to zero. Numerical experiments validate our findings.

@inproceedings{xmtmv2023aistats,
  abstract = {Recent theoretical analyses of the convergence of gradient descent (GD) to a global minimum for over-parametrized neural networks make strong assumptions on the step size (infinitesimal), the hidden-layer width (infinite), or the initialization (spectral, balanced). In this work, we relax these assumptions and derive a linear convergence rate for two-layer linear networks trained using GD on the squared loss in the case of finite step size, finite width and general initialization. Despite the generality of our analysis, our rate estimates are significantly tighter than those of prior work. Moreover, we provide a time-varying step size rule that monotonically improves the convergence rate as the loss function decreases to zero. Numerical experiments validate our findings.},
  author = {Xu, Ziqing and Min, Hancheng and Tarmoun, Salma and Mallada, Enrique and Vidal, Rene},
  booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
  editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  grants = {No Grant},
  month = {4},
  pages = {2262--2284},
  publisher = {PMLR},
  record = {published, accepted Jan 2023, submitted Oct 2022},
  series = {Proceedings of Machine Learning Research},
  title = {Linear Convergence of Gradient Descent For Overparametrized Finite Width Two-Layer Linear Networks with General Initialization},
  url = {https://mallada.ece.jhu.edu/pubs/2023-AISTATS-XMTMV.pdf},
  volume = {206},
  year = {2023}
}