TFML Talk: A mean-field view on transformer models
25 Jun 2025, 14:00 — Room 509, UniGe DIBRIS/DIMA, Via Dodecaneso 35
Speaker:
Andrea Agazzi — Institute of Mathematical Statistics and Actuarial Science - Universität Bern
Andrea Agazzi — Institute of Mathematical Statistics and Actuarial Science - Universität Bern
Abstract:
Transformers are a central architecture in modern deep learning, forming the backbone of large language models such as ChatGPT. In this talk, I will present a mathematical framework for studying how information—represented as "tokens"—evolves through the layers of such neural networks. Specifically, we consider a family of partial differential equations that describe how the distribution of tokens—modeled as particles interacting in a mean-field way—changes with depth. Numerical experiments reveal that, under certain conditions, these dynamics exhibit a metastable clustering phenomenon, where tokens group into well-separated clusters that evolve slowly over time. A rigorous analysis of this behavior uncovers a range of open questions and unexpected connections to various areas of mathematics.
Transformers are a central architecture in modern deep learning, forming the backbone of large language models such as ChatGPT. In this talk, I will present a mathematical framework for studying how information—represented as "tokens"—evolves through the layers of such neural networks. Specifically, we consider a family of partial differential equations that describe how the distribution of tokens—modeled as particles interacting in a mean-field way—changes with depth. Numerical experiments reveal that, under certain conditions, these dynamics exhibit a metastable clustering phenomenon, where tokens group into well-separated clusters that evolve slowly over time. A rigorous analysis of this behavior uncovers a range of open questions and unexpected connections to various areas of mathematics.
Bio:
Andrea leads the group of Stochastic Analysis and Applications in the Mathematics and Statistics Department at the University of Bern, where he serves as Associate Professor. Before moving to Bern, he was Assistant Professor (RTD/b) in the Mathematics Department at the University of Pisa. Previous to that, Andrea was Griffiths research Assistant Professor in the Math Department at Duke University. He obtained his PhD in Theoretical Physics, under the supervision of Jean-Pierre Eckmann, at the University of Geneva, after graduating, in physics, from Imperial College London and ETH Zurich.
Andrea leads the group of Stochastic Analysis and Applications in the Mathematics and Statistics Department at the University of Bern, where he serves as Associate Professor. Before moving to Bern, he was Assistant Professor (RTD/b) in the Mathematics Department at the University of Pisa. Previous to that, Andrea was Griffiths research Assistant Professor in the Math Department at Duke University. He obtained his PhD in Theoretical Physics, under the supervision of Jean-Pierre Eckmann, at the University of Geneva, after graduating, in physics, from Imperial College London and ETH Zurich.
Links: