Our paper on Early Neuron Alignment in Two-layer ReLU Networks [1] has been accepted to the International Conference on Representation Learning. Congrats Hancheng!

[Bibtex] [Abstract] [Download PDF]

This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons’ directional dynamics allows us to provide an $\mathcalO(\fracłog n\sqrtμ)$ upper bound on the time it takes for all neurons to achieve good alignment with the input data, where $n$ is the number of data points and $μ$ measures how well the data are separated. After the early alignment phase, the loss converges to zero at a $\mathcalO(\frac1t)$ rate, and the weight matrix on the first layer is approximately low-rank. Numerical experiments on the MNIST dataset illustrate our theoretical findings.

```
@inproceedings{mvm2024iclr,
abstract = {This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons' directional dynamics allows us to provide an $\mathcalO(\fracłog n\sqrtμ)$ upper bound on the time it takes for all neurons to achieve good alignment with the input data, where $n$ is the number of data points and $μ$ measures how well the data are separated. After the early alignment phase, the loss converges to zero at a $\mathcalO(\frac1t)$ rate, and the weight matrix on the first layer is approximately low-rank. Numerical experiments on the MNIST dataset illustrate our theoretical findings.},
author = {Min, Hancheng and Vidal, Rene and Mallada, Enrique},
booktitle = {International Conference on Representation Learning (ICLR)},
grants = {CAREER-1752362},
month = {05},
record = {published, accepted Jan 2024, submitted Sep 2023},
title = {Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization},
url = {https://mallada.ece.jhu.edu/pubs/2024-ICLR-MVM.pdf},
year = {2024}
}
```