## MATH 697 Analysis and Machine Learning (archived course page from Fall 2017)

**Minimal requirements:**Undergraduate real analysis (basics of metric spaces, integration), basic probability (distributions, random variables), strong background in calculus and linear algebra.

**Ideal requirements:**Familiarity with one or more of the following topics: measure theory, differentiable manifolds, linear programming.

**Textbook:**The Elements of Statistical Learning (Springer Series in Statistics), 2nd edition. Authors: Trevor Hastie, Robert Tibshirani, and Jerome Friedman.

**Lecture Time**: Tue-Thur 8:30-9:45 am in Lederle Grad. Res. 1234

**Office Hours**: Mon 3 pm-5 pm. Tue 10 am-11am.

**Evaluation:**Problem sets (60%). Project (40%).

Solutions to problem sets are to be submitted in latex format, via email. No late problem sets will be accepted. Out of 8 problem sets, lowest two grades will be dropped in computing the total.

The project will be done in teams of two people (if needed, a couple of teams with 3 people will be allowed)

**Updated! (November 10th)**Important dates for the project: Tuesday of 11th week. Thursday of 15th week.

**Problem sets**

First Second Third Fourth Fifth
**Lecture plan and slides** (updated 10/23)

**Week 1.**(Slides)

Tuesday class: Review prerequisites, basic facts, and notation. Function approximation/estimation as motivation. Four terms: supervised/unsupervised learning, regression, and classification. A glance at the latter half of the courses: the calculus of variations, partial differential equations, and optimization.

*Reading:*(From HTF) Introduction, Sections 2.1 and 2.2.

Thursday class: Least squares versus $k$-nearest neighbor. Statistical decision theory.

*Reading:*(From HTF) Sections 2.3 and 2.4.

**Week 2.**(Slides)

Tuesday class: Curse of dimensionality and how it affects $k$-NN and least squares. Mean squared error and expected prediction error. Statistical models and restricted function classes.

*Reading:*(From HTF) Sections 2.6 and 2.7.

Thursday class: Maximum likelihood. Least squares and ML for additive models. More about restricted regression models: roughness penalty, kernel methods, and local regression.

*Reading:*(From HTF) Sections 2.8.1 and 2.8.2.

**Week 3.**(Slides)

Tuesday class: Model selection: Training error versus generalization error. Parameter inference: Gaussian model for the error, and how $\hat \beta$ is distributed. The Gauss Markov Theorem.

*Reading:*(From HTF) Sections 2.9, 3.1, and Section 3.2.2.

Thursday class: The singular value decomposition and least squares. Univariate versus Multivariate regression. Shortcomings of least squares and the bias-variance dichotomy. Subset selection. Ridge regression and intro to the Lasso.

*Reading:*(From HTF) Section 3.2.3, 3.3.1, and Section 3.4.1 up to page 64.

**Week 4.**(Slides)

Tuesday class: A rapid course on convex functions (the Legendre transform, the subdifferential, critical points and the subdifferential). The Lasso and characterization of minimizers through the subdifferential. Example: the Lasso for an orthogonal input matrix and the shrinkage operator.

*Reading:*(From HTF) Rest of Section 3.4.1, Section 3.4.2.

Thursday class: Discussion of the first problem set (distributions, convolutions, approximations to the identity). Lasso as a constrained quadratic program. The geometry of $\ell^1$ and how it explains the sparsity of Lasso solutions.

*Reading:*(From HTF) Section 3.4.3 and Section 3.6.

**Week 5-6.**(No class)

**Week 7.**(Slides)

Tuesday class: Using the Lasso with least squares to minimize bias. Inference when $p>>N$ via the Dantzig selector. The Restricted Isometry Property and a Theorem of Candes and Tao. Summary of linear regression. Basics of linear classification: regression via indicator matrices, masking.

*Reading:*(From HTF) Sections 3.8.3 and 3.8.4, Sections 4.1 and 4.2.

Thursday class: Linear and quadratic discriminant analysis with Gaussian mixtures. Logistic regression.

*Reading:*(From HTF) Section 4.3 (up to page 112), Sections 4.4.1.

**Week 8.**(Slides)

Tuesday class: More on Logistic Regression and gradient descent. Logistic regression versus LDA. Rosenblatt's separating hyperplane algorithm and Vapnik's optimality criterion.

*Reading:*(From HTF) Sections 4.4.1, 4.4.5, and Section 4.5

Thursday class: A few thoughts on elliptic PDE and the Calculus of Variations. The Laplacian in $\mathbb{R}^d$ and its many versions. Fourier series, eigenfunctions of the second derivative, and the heat equation.

*Reading:*(From HTF) Sections 7.1, 7.2, 7.4 and 7.5.

**Week 9.**(Slides)

Tuesday class: More on the calculus of variations: the Euler-Lagrange equation and Dirichlet's principle for harmonic functions. Historical detour: Hilbert's 19th problem. The smoothing effect of the heat equation. The mean value property for harmonic functions.

*Reading:*N.A.

Thursday class: The strong maximum principle. The comparison principle for Laplace's equation and its consequences. The fractional Laplacian.

*Reading:*N.A.

**Week 10.**(Slides)

Tuesday class: Infinite differentiability of harmonic functions and interior derivative estimates. Graphs, their weights and graph Laplacians.

*Reading:*N.A.

Thursday class: The strong maximum principle in a graph. Measuring smoothness for functions in a graph. Spectral properties of the combinatorial Laplacian, and its eigenfunction decomposition. The heat kernel in a graph.

*Reading:*N.A.

**Week 11.**(Slides)

Tuesday class: The Dirichlet problem on a grpah. Semi-supervised learning and harmonic minimization.

*Reading:*N.A.

Thursday class: Spectral representation of data. Spectral clustering and the Shi-Malik normalized cut functional.

*Reading:*N.A.

**Week 12.**(Slides)

Tuesday class: More clustering: Dissimilarity matrices, the $K$-means algorithm and an example in vector quantization. A comparison of various 'metrics' over distributions: $L^p$, Total Variation, the Earth Mover Distance, the Kullback-Leibler divergence.

*Reading:*N.A.

Thursday class: Basics of optimal transport. The Monge Problem and Kantorovich Problems in the discrete and continuous settings. Brenier's theorem and the role of convex potentials in optimal transport. Transport with a discrete target measure.

*Reading:*N.A.