Convergence and optimality of wide RNNs in the mean-field regime
03 May 2023, 15:00 — Room 322, UniGe DIBRIS, Via Dodecaneso 35 The seminar will be streamed online on Teams, details below. Meeting ID: 360 543 033 187 Passcode: PiwCVp
Speaker:
Andrea Agazzi — Università di Pisa
Andrea Agazzi — Università di Pisa
Abstract:
Recurrent neural networks (RNNs) are a family of neural network architectures that is traditionally used to learn from data with a time-series structure. As the name suggests, these networks have a recurrent structure, i.e., for each timestep of the predictor the (hidden) state of the network is fed back to the model as an input, allowing it to maintain a "memory" of past inputs. In this talk, we extend a series of results on the training of wide neural networks in the so-called "mean-field" regime to the RNN structure. More specifically, we prove that the gradient descent training dynamics of Elman-type RNNs converge in an appropriate sense, as the width of the network diverges, to a set of "mean-field" ODEs. Furthermore we prove that, under some conditions on the data and the initialization of the network, the fixed points of such limiting, "mean-field" dynamics are globally optimal. This is joint work with Jianfeng Lu and Sayan Mukherjee.
Recurrent neural networks (RNNs) are a family of neural network architectures that is traditionally used to learn from data with a time-series structure. As the name suggests, these networks have a recurrent structure, i.e., for each timestep of the predictor the (hidden) state of the network is fed back to the model as an input, allowing it to maintain a "memory" of past inputs. In this talk, we extend a series of results on the training of wide neural networks in the so-called "mean-field" regime to the RNN structure. More specifically, we prove that the gradient descent training dynamics of Elman-type RNNs converge in an appropriate sense, as the width of the network diverges, to a set of "mean-field" ODEs. Furthermore we prove that, under some conditions on the data and the initialization of the network, the fixed points of such limiting, "mean-field" dynamics are globally optimal. This is joint work with Jianfeng Lu and Sayan Mukherjee.
Bio:
Andrea Agazzi is Assistant Professor in the Mathematics Department at the University of Pisa. After a PhD in theoretical physics at the University of Geneva, he was Griffith Research Assistant Professor in the Mathematics Department at Duke University. His interests space broadly between probability theory, stochastic analysis and their applications, in particular to problems in deep learning theory.
Andrea Agazzi is Assistant Professor in the Mathematics Department at the University of Pisa. After a PhD in theoretical physics at the University of Geneva, he was Griffith Research Assistant Professor in the Mathematics Department at Duke University. His interests space broadly between probability theory, stochastic analysis and their applications, in particular to problems in deep learning theory.