Luc Rey-Bellet
LGRT 1423K
luc@math.umass.edu
Tu-Th, 2:30PM--3:45PM in LGRT 1334
On Moodle at https://umass.moonami.com/course/view.php?id=33139
This class is an introduction to selected topics from Information Theory and Optimal Transport with a view toward (some) applications in Machine Learning and Statistical Learning.
Prerequisites are
We will use some basic measure theoretical concepts and recall them as needed. At some junctures of the class we also use and recall some fact from functional analysis (basic Hilbert space theory, Riesz reprresentation theorems and Hahn-Banach theorem, etc...).
Convexity plays a recurrent and crucial role in this class and we shall spend some time developing the part of convex analysis needed. Legendre transform play an important role.
We aim to achieve a rather broad overview of various topics and as such we will not have time to cover all the technical details. Some proofs will be omitted and left to study for the dedicated reader.
Among the topics treated in this class.
Entropy, Relative Entropy, Cross entropy, Mutual Information and applications. The maximum entropy principle and Gibbs measures.
Kullback-Leibler and \(f\)-divergences (e.g. Jensen-Shannon distance, Hellinger distance and \(\chi^2\)-divergences or more generally the \(\alpha\)-divergence family) and their variational reprsentations. Renyi divergences. Applications to Uncertainty Quantification.
Renyi divergences and connections to rare events.
Integral Probability metrics. Various classical examples including MMD metrics based on kernels and Reproducing Kernel Hilbert spaces. Basics of RKHS nd kernel methods will be covered
Basics of Optimal Transport, Wasserstein metrics and gradient flows.
Regularized Optimal transport (Sinkhorn divergences) and regularized \(f\)-divergences. These are tools used in Machine Learning, e.g. in Generative adversarial networks.
All topics will be also studied with a view toward data science. For example how one can compute/estimate probability metrics or divergences when equipped with finite data. Many topics of the class can be implemented numerically directly in the context GAN's, kernel methods, and so on...
Build up a solid foundation on distance and divergences between probability distributions, especially on those used in the context statistical learning and/or machine learning. Some of the results are fairly recent.
You are expected to complement the class by regular independent reading. You need to select a topic of your choice in the first few weeks of the class and you will be assigned further reading, leading to a class presentation + final paper. Theoretical, applied, or computational topics are ecouraged.
There is no textbook know to the class instructor which covers all the topics in the class. We will borrow from various sources.
These are good references for "classical" topics in Information Theory (e.g. for classes taught in Engineering department). I will parts of those.
References for \(f\)-divergences
Reference for RKHS and kernel methods
References for optimal transport
References for convexity