ST 797

State Space Markov Models
ST 797M

Michael Lavine, Professor of Statistics

This is an advanced statistics course on state-space models and their Bayesian analysis. Some examples include

the water level in Lake Huron
the level of CO2 in the atmosphere
the size of an animal population through time
the level of neural activity in various regions of the brain
models of infectious disease
models of stock returns
missile and satellite tracking

During the course we will learn how to build and analyze state space models to incorporate many different features including locally smooth behavior, quasi-cyclic behavior, heteroscedasticity and multivariate observations. We will work heavily with the statistical software R and especially with the dlm package.

R is the software of choice for many research statisticians. It can be downloaded freely on your own computer from the R web site.

Students will find their own data, develop state-space models for their data, and report frequently to the whole class.

We will begin with a paper on the Kalman filter, then use the book Dynamic Linear Models with R. This book is not yet published; please download the pdf file; it is permissible to use it only for the purpose of this class; it is not to be distributed otherwise. Finally, in the last portion of the course we will use Gaussian Markov Random Fields by Rue and Held.

During the semester we will occasionally look at some neurobiology data that I've been analyzing. Download neuro data

Tuesday, Jan 29

Introduction to DLMs and state-space models

Thursday, Jan 31

Example of locally constant (polynomial of order 1) dlm for the Lake Huron data. R code.

Tuesday, Feb 5

I will give an introduction to Bayesian statistics and show how to compute all the relevant distributions in the Normal case. If you want to think about the comparison between classical and Bayesian statistics, you might like the paper What is Bayesian Statistics and Why Everything Else is Wrong. I will also explain why some of the curves we fit on Thursday are smoother while others are rougher.

Thursday, Feb 7

We did a locally linear model for the neuro data, Region 1. We saw that the model fit too well because the trend had to account for heartbeat and respiration in addition to the response to the stimulus. One solution is to add a cyclic term for respiration. We saw how to encode a cyclic term as a dlm. R code.

Tuesday, Feb 12

We talked about the basic DLM's: polynomial, seasonal, fourier, and regression. We showed how to write ARMA models as DLM's. We will use these basics as building blocks in more elaborate models. This material is in Chapter 3 of the book.
We still have to deal with model checking, unknown parameters, multivariate observations, and a few other things.

Thursday, Feb 14

Melissa Eliot presented data on cases of measles in London over 20 years. Aaron Ellison presented data on net ecosystem exchange (of carbon) between a forest and the atmosphere. We did some model checking with the neuro data. R code.

Thursday, Feb 21

Li Cai and Yue Zhao presented financial data. We saw multivariate models for three of the neuro series simultaneously. The series had their own locally linear components and shared components for heart beat and respiration. R code.

Tuesday, Feb 26

I reviewed the multivariate dlm for three neuro series. Then I introduced a univariate model in which some of the cyclic components had two harmonics. David Resendes showed his data on Atlantic Salmon, Brook Trout and Brown Trout. Next time I will start talking about dlm's with unknown parameters. That material is in Chapter 4 of the book. R code.

Thursday, Feb 28

I talked about a conjugate Bayesian analysis for V and W. That material is in Section 4.3.1. I briefly mentioned simulation, or drawing random samples, of the states. That's from Section 4.5.

Li Cai showed data which seemed to have a small variance early on and a large variance later. To model the change in variance she used a dlm in which V and W changed through time. The way to make V, W, F, or G change through time is explained in the book beginning at the bottom of page 44 and also on pages 5--6 of the overview that comes with the help pages of the dlm package in R.

Tuesday, Mar 4

I showed a Gibbs sampler for treating unknown variances in model cyc6. My R code.

Daniel Irion and Sammy Zriek showed their financial data from the S&P 500 and the Bank of America. They want to treat BAC as a linear regression on S&P with time-varying parameters.

Thursday, Mar 6

David Mimno showed jobs data from Monster. David is going to fit models with seasonal effects for the academic year and the retail year and also for business cycles. Cailin Xu showed data and a few models on fish growth. She fit random walk and locally linear models paired with either trigonometric or seasonal models. She was puzzled that the models without cyclic components had higher likelihood than the models with seasonal components.

I talked about the strategy of constructing conditionally Gaussian dlm's, given some parameters. Those parameters can capture interesting features like unknown variances, outliers, and other things that we may see later in the course. We talked about several models for outliers and that we have to consider outliers in both v and w.

Tuesday, Mar 11

Yue Zhao showed more analysis of his financial data. He's looking at the relationship between an index, such as the S&P 500, on one hand, and an individual stock price on the other hand. He tested his model by using the index to predict the index; it seemed to work. Directions for the future are (1) use relative, instead of absolute, change, (2) check goodness of fit, (3) see about fitting all components at once, instead of sequentially, (4) treat unknown parameters in a Bayesian way instead of plug-in.

I showed a model for accomodating outliers. That model is explained in the book in Section 4.7.2.

State Space Markov Models ST 797M