Syllabus Math 797TT (Fall2022):
Information Theory and Optimal Transport

Instructor

Luc Rey-Bellet
LGRT 1423K
luc@math.umass.edu

Office hours:

Meeting

Tu-Th, 2:30PM--3:45PM in LGRT 1334

Class homepage

On Moodle at https://umass.moonami.com/course/view.php?id=33139

Syllabus

This class is an introduction to selected topics from Information Theory and Optimal Transport with a view toward (some) applications in Machine Learning and Statistical Learning.

Prerequisites are

  1. a solid working knowledge of probability
  2. a solid working knowedge of analysis
  3. a general mathematical maturity

We will use some basic measure theoretical concepts and recall them as needed. At some junctures of the class we also use and recall some fact from functional analysis (basic Hilbert space theory, Riesz reprresentation theorems and Hahn-Banach theorem, etc...).

Convexity plays a recurrent and crucial role in this class and we shall spend some time developing the part of convex analysis needed. Legendre transform play an important role.

We aim to achieve a rather broad overview of various topics and as such we will not have time to cover all the technical details. Some proofs will be omitted and left to study for the dedicated reader.

Among the topics treated in this class.

Learning Objectives

Build up a solid foundation on distance and divergences between probability distributions, especially on those used in the context statistical learning and/or machine learning. Some of the results are fairly recent.

Grade/assignment

You are expected to complement the class by regular independent reading. You need to select a topic of your choice in the first few weeks of the class and you will be assigned further reading, leading to a class presentation + final paper. Theoretical, applied, or computational topics are ecouraged.

Textbooks

There is no textbook know to the class instructor which covers all the topics in the class. We will borrow from various sources.

  1. These are good references for "classical" topics in Information Theory (e.g. for classes taught in Engineering department). I will parts of those.

  2. References for \(f\)-divergences

  3. Reference for RKHS and kernel methods

  4. References for optimal transport

  5. References for convexity