Deep learning theory through the lens of diagonal linear networks
18 Nov 2024, 14:30 — Room 715, UniGe DIBRIS/DIMA, Via Dodecaneso 35
Speaker:
Scott Pesme — Inria Grenoble
Scott Pesme — Inria Grenoble
Abstract:
Surprisingly, many optimisation phenomena observed in complex neural networks also appear in so-called 2-layer diagonal linear networks. This rudimentary architecture—a two-layer feedforward linear network with a diagonal inner weight matrix—has the advantage of revealing key training characteristics while keeping the theoretical analysis clean and insightful. In this talk, I’ll provide an overview of various theoretical results for this architecture, while drawing connections to experimental observations from practical neural networks. Specifically, we’ll examine how hyperparameters such as the initialisation scale, step size, and batch size impact the optimisation trajectory and influence the generalisation performances of the recovered solution.
Surprisingly, many optimisation phenomena observed in complex neural networks also appear in so-called 2-layer diagonal linear networks. This rudimentary architecture—a two-layer feedforward linear network with a diagonal inner weight matrix—has the advantage of revealing key training characteristics while keeping the theoretical analysis clean and insightful. In this talk, I’ll provide an overview of various theoretical results for this architecture, while drawing connections to experimental observations from practical neural networks. Specifically, we’ll examine how hyperparameters such as the initialisation scale, step size, and batch size impact the optimisation trajectory and influence the generalisation performances of the recovered solution.
Bio:
Scott Pesme is a postdoctoral researcher at Inria Grenoble, working with Julien Mairal. He obtained his PhD in 2024 at EPFL under the supervision of Nicolas Flammarion, where he studied the training dynamics of optimisation methods in deep learning. His research focused on diagonal linear networks, a drastic but enlightening simplification of complex architectures.
Scott Pesme is a postdoctoral researcher at Inria Grenoble, working with Julien Mairal. He obtained his PhD in 2024 at EPFL under the supervision of Nicolas Flammarion, where he studied the training dynamics of optimisation methods in deep learning. His research focused on diagonal linear networks, a drastic but enlightening simplification of complex architectures.