Syllabus Math 797TT (Fall2022):
Information Theory and Optimal Transport

Instructor

Luc Rey-Bellet
LGRT 1423K
luc@math.umass.edu

Office hours:

Monday 11:00 AM-12 PM on ZOOM
Friday 1:00 PM-2:00 2PM in LGRT 1423 K or on ZOOM
By appointment is always possible, and/or ask your questions be email. I am usually available on Wednesdays.

Meeting

Tu-Th, 2:30PM--3:45PM in LGRT 1334

Class homepage

On Moodle at https://umass.moonami.com/course/view.php?id=33139

Syllabus

This class is an introduction to selected topics from Information Theory and Optimal Transport with a view toward (some) applications in Machine Learning and Statistical Learning.

Prerequisites are

a solid working knowledge of probability
a solid working knowedge of analysis
a general mathematical maturity

We will use some basic measure theoretical concepts and recall them as needed. At some junctures of the class we also use and recall some fact from functional analysis (basic Hilbert space theory, Riesz reprresentation theorems and Hahn-Banach theorem, etc...).

Convexity plays a recurrent and crucial role in this class and we shall spend some time developing the part of convex analysis needed. Legendre transform play an important role.

We aim to achieve a rather broad overview of various topics and as such we will not have time to cover all the technical details. Some proofs will be omitted and left to study for the dedicated reader.

Among the topics treated in this class.

Entropy, Relative Entropy, Cross entropy, Mutual Information and applications. The maximum entropy principle and Gibbs measures.
Kullback-Leibler and $f$ -divergences (e.g. Jensen-Shannon distance, Hellinger distance and $\chi^2$ -divergences or more generally the $\alpha$ -divergence family) and their variational reprsentations. Renyi divergences. Applications to Uncertainty Quantification.
Renyi divergences and connections to rare events.
Integral Probability metrics. Various classical examples including MMD metrics based on kernels and Reproducing Kernel Hilbert spaces. Basics of RKHS nd kernel methods will be covered
Basics of Optimal Transport, Wasserstein metrics and gradient flows.
Regularized Optimal transport (Sinkhorn divergences) and regularized $f$ -divergences. These are tools used in Machine Learning, e.g. in Generative adversarial networks.
All topics will be also studied with a view toward data science. For example how one can compute/estimate probability metrics or divergences when equipped with finite data. Many topics of the class can be implemented numerically directly in the context GAN's, kernel methods, and so on...

Learning Objectives

Build up a solid foundation on distance and divergences between probability distributions, especially on those used in the context statistical learning and/or machine learning. Some of the results are fairly recent.

Grade/assignment

You are expected to complement the class by regular independent reading. You need to select a topic of your choice in the first few weeks of the class and you will be assigned further reading, leading to a class presentation + final paper. Theoretical, applied, or computational topics are ecouraged.

Textbooks

There is no textbook know to the class instructor which covers all the topics in the class. We will borrow from various sources.

These are good references for "classical" topics in Information Theory (e.g. for classes taught in Engineering department). I will parts of those.
- Thomas Cover and Jay Thomas, Elements of Information Theory, Wiley.
- Y. Polyanskiy and Y. Wu Lecture Notes on Information Theory https://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf
References for $f$ -divergences
- Friedrich Liese and Igor Vajda, Convex Statistical Distances, Teubner
- more to come...
Reference for RKHS and kernel methods
- Julien Mairal and Jean-Philippe Vert, Machine Learning with Kernel Methods, Slides and Class homepage
- Martin Wrainwright, High-dimensional Statistics, Cambridge University Press
- Ingo Steinwart and Andreas Christmann Support Vector machines, Springer
References for optimal transport
- Filippo Santambrogio, Optimal Transport for Applied Mathematicians, Birkhauser
- Filippo Santambrogio, {Euclidean, metric, and Wasserstein} gradient flows: an overview, Bull. Math. Sci. (2017) 7:87–154.
- Marcel Nutz, Introduction to Entropic Optimal Transport http://www.math.columbia.edu/~mnutz/docs/EOT_lecture_notes.pdf
- Gabriel Peyré and Marco Cuturi, Computational OptimalTransport https://arxiv.org/abs/1803.00567
References for convexity
- Barry Simon, Convexity:an analytic viewpoint, Cambridge University Press,
- Ehran Cinlar, Real and convex analysis, Springer (for its very clear presentation of Legendre transform in $\mathbb{R}^d$

Syllabus Math 797TT (Fall2022): Information Theory and Optimal Transport