MATH 697 Analysis and Machine Learning


Problem sets

First   Second   Third  Fourth  

Lecture plan and slides (updated 10/23)

  • Week 1.  (Slides)
    Tuesday class: Review prerequisites, basic facts, and notation. Function approximation/estimation as motivation. Four terms: supervised/unsupervised learning, regression, and classification. A glance at the latter half of the courses: the calculus of variations, partial differential equations, and optimization.
    Reading: (From HTF) Introduction, Sections 2.1 and 2.2.

    Thursday class: Least squares versus $k$-nearest neighbor. Statistical decision theory.
    Reading: (From HTF) Sections 2.3 and 2.4.

  • Week 2.  (Slides)
    Tuesday class: Curse of dimensionality and how it affects $k$-NN and least squares. Mean squared error and expected prediction error. Statistical models and restricted function classes.
    Reading: (From HTF) Sections 2.6 and 2.7.

    Thursday class: Maximum likelihood. Least squares and ML for additive models. More about restricted regression models: roughness penalty, kernel methods, and local regression.
    Reading: (From HTF) Sections 2.8.1 and 2.8.2.

  • Week 3.  (Slides)
    Tuesday class: Model selection: Training error versus generalization error. Parameter inference: Gaussian model for the error, and how $\hat \beta$ is distributed. The Gauss Markov Theorem.
    Reading: (From HTF) Sections 2.9, 3.1, and Section 3.2.2.

    Thursday class: The singular value decomposition and least squares. Univariate versus Multivariate regression. Shortcomings of least squares and the bias-variance dichotomy. Subset selection. Ridge regression and intro to the Lasso.
    Reading: (From HTF) Section 3.2.3, 3.3.1, and Section 3.4.1 up to page 64.

  • Week 4.  (Slides)
    Tuesday class: A rapid course on convex functions (the Legendre transform, the subdifferential, critical points and the subdifferential). The Lasso and characterization of minimizers through the subdifferential. Example: the Lasso for an orthogonal input matrix and the shrinkage operator.
    Reading: (From HTF) Rest of Section 3.4.1, Section 3.4.2.

    Thursday class: Discussion of the first problem set (distributions, convolutions, approximations to the identity). Lasso as a constrained quadratic program. The geometry of $\ell^1$ and how it explains the sparsity of Lasso solutions.
    Reading: (From HTF) Section 3.4.3 and Section 3.6.

  • Week 5-6.  (No class)

  • Week 7.  (Slides)
    Tuesday class: Using the Lasso with least squares to minimize bias. Inference when $p>>N$ via the Dantzig selector. The Restricted Isometry Property and a Theorem of Candes and Tao. Summary of linear regression. Basics of linear classification: regression via indicator matrices, masking.
    Reading: (From HTF) Sections 3.8.3 and 3.8.4, Sections 4.1 and 4.2.

    Thursday class: Linear and quadratic discriminant analysis with Gaussian mixtures. Logistic regression.
    Reading: (From HTF) Section 4.3 (up to page 112), Sections 4.4.1.

  • Week 8.  (Slides)
    Tuesday class: More on Logistic Regression and gradient descent. Logistic regression versus LDA. Rosenblatt's separating hyperplane algorithm and Vapnik's optimality criterion.
    Reading: (From HTF) Sections 4.4.1, 4.4.5, and Section 4.5

    Thursday class: A few thoughts on elliptic PDE and the Calculus of Variations. The Laplacian in $\mathbb{R}^d$ and its many versions. Fourier series, eigenfunctions of the second derivative, and the heat equation.
    Reading: (From HTF) Sections 7.1, 7.2, 7.4 and 7.5.

  • Week 9.  (Slides)
    Tuesday class: More on the calculus of variations: the Euler-Lagrange equation and Dirichlet's principle for harmonic functions. Historical detour: Hilbert's 19th problem. The smoothing effect of the heat equation. The mean value property for harmonic functions.
    Reading: N.A.

    Thursday class: The strong maximum principle. The comparison principle for Laplace's equation and its consequences. The fractional Laplacian.
    Reading: N.A.

  • Week 10.  (Slides)
    Tuesday class: Infinite differentiability of harmonic functions and interior derivative estimates. Graphs, their weights and graph Laplacians.
    Reading: N.A.

    Thursday class: The strong maximum principle in a graph. Measuring smoothness for functions in a graph. Spectral properties of the combinatorial Laplacian, and its eigenfunction decomposition. The heat kernel in a graph.
    Reading: N.A.

  • Week 11.  (Slides)
    Tuesday class: The Dirichlet problem on a grpah. Semi-supervised learning and harmonic minimization.
    Reading: N.A.

    Thursday class: Spectral representation of data. Spectral clustering and the Shi-Malik normalized cut functional.
    Reading: N.A.