/master-thesis

Learning transitions for one-pass stochastic gradient descent on shallow neural networks

Primary LanguageTeX

Learning transitions for one-pass stochastic gradient descent on shallow neural networks

Supervisors: F. Krzakala, B. Loureiro

Abstract. In recent years, neural networks have made possible great progress in several fields of artificial intelligence, but their theoretical understanding is still lacking. In this thesis, we study the high-dimensional input limit of a two-layer neural network, through statistical physic tools. Using the squared activation function we are able to derive some ODEs for the dynamics of sufficient statistics, that can then be used for estimating time of transition between learning phases. We apply this analysis to the simplest case known as phase retrieval, exploring different kinds of initial conditions. We then study the dynamics with the weights constrained on a hypersphere; we estimate the exit time from the first phase of learning, therefrom we derive an estimate of the gain that occurs by overparameterizing the network. We conclude by adding a stochastic corrective term to the equations, showing that this leads to a better estimation of the exit times.

Resources