State Space Markov Models
ST 797M
Michael Lavine, Professor of Statistics
This is an advanced statistics course on state-space models and their
Bayesian analysis. Some examples include
- the water level in Lake Huron
- the level of CO2 in the atmosphere
- the size of an animal population through time
- the level of neural activity in various regions of the brain
- models of infectious disease
- models of stock returns
- missile and satellite tracking
During the course we will learn how to build and analyze state space
models to incorporate many different features including locally smooth
behavior, quasi-cyclic behavior, heteroscedasticity and multivariate
observations. We will work heavily with the statistical software
R
and especially with the dlm
package.
R
is the software of choice for many research
statisticians. It can be downloaded freely on your own computer from
the R
web site.
Students will find their own data, develop state-space models for
their data, and report frequently to the whole class.
We will begin with a paper on the
Kalman filter, then use the book Dynamic Linear Models with R.
This book is not yet
published; please download the pdf file; it is permissible to use it
only for the purpose of this class; it is not to be distributed
otherwise. Finally, in the last portion of the course we will use
Gaussian Markov Random Fields by Rue and Held.
During the semester we will occasionally look at some neurobiology data
that I've been analyzing. Download neuro data
Tuesday, Jan 29
Introduction to DLMs and state-space models
Thursday, Jan 31
Example of locally constant (polynomial of order 1) dlm for the Lake Huron data.
R code
.
Tuesday, Feb 5
I will give an introduction to Bayesian
statistics and show how to compute all the relevant distributions in
the Normal case. If you want to think about the comparison between
classical and Bayesian statistics, you might like the paper What is Bayesian Statistics and Why Everything Else is Wrong.
I will also explain why some of the curves we fit on Thursday are
smoother while others are rougher.
Thursday, Feb 7
We did a locally linear model for the neuro
data, Region 1. We saw that the model fit too well because the trend
had to account for heartbeat and respiration in addition to the
response to the stimulus. One solution is to add a cyclic term for
respiration. We saw how to encode a cyclic term as a dlm.
R code
.
Tuesday, Feb 12
We talked about the basic DLM's:
polynomial, seasonal, fourier, and regression. We showed how to write
ARMA models as DLM's. We will use these basics as building blocks in
more elaborate models. This material is in Chapter 3 of the book.
We still have to deal with model checking, unknown parameters,
multivariate observations, and a few other things.
Thursday, Feb 14
Melissa Eliot presented data on cases of
measles in London over 20 years. Aaron Ellison presented data on net ecosystem exchange (of
carbon) between a forest and the atmosphere. We did some model
checking with the neuro data.
R code
.
Thursday, Feb 21
Li Cai and Yue Zhao presented financial data. We saw multivariate models
for three of the neuro series simultaneously. The series had their own locally linear components
and shared components for heart beat and respiration.
R code
.
Tuesday, Feb 26
I reviewed the multivariate dlm for three neuro series. Then I introduced a univariate model in which some of the cyclic components had two harmonics. David Resendes showed his
data on Atlantic Salmon, Brook Trout and Brown Trout. Next time I will start talking about dlm's with
unknown parameters. That material is in Chapter 4 of the book.
R code
.
Thursday, Feb 28
I talked about a conjugate Bayesian analysis for V and W. That material is in Section 4.3.1. I briefly mentioned simulation, or drawing random samples, of the states. That's from Section 4.5.
Li Cai showed data which seemed to have a small variance early on and a large variance later. To model the change in variance she used a dlm in which V and W changed through time. The way to make V, W, F, or G change through time is explained in the book beginning at the bottom of page 44 and also on pages 5--6 of the overview that comes with the help pages of the dlm package in R.
Tuesday, Mar 4
I showed a Gibbs sampler for treating unknown variances in model cyc6. My R code
.
Daniel Irion and Sammy Zriek showed their financial data from the S&P 500 and the Bank of America. They want to treat BAC as a linear regression on S&P with time-varying parameters.
Thursday, Mar 6
David Mimno showed jobs data from Monster. David is going to fit models with seasonal effects for the academic year and the retail year and also for business cycles. Cailin Xu showed data and a few models on fish growth. She fit random walk and locally linear models paired with either trigonometric or seasonal models. She was puzzled that the models without cyclic components had higher likelihood than the models with seasonal components.
I talked about the strategy of constructing conditionally Gaussian dlm's, given some parameters. Those parameters can capture interesting features like unknown variances, outliers, and other things that we may see later in the course. We talked about several models for outliers and that we have to consider outliers in both v and w.
Tuesday, Mar 11
Yue Zhao showed more analysis of his financial data. He's looking at the
relationship between an index, such as the S&P 500, on one hand, and an individual stock price on
the other hand. He tested his model by using the index to predict the index; it seemed to work. Directions for the future are (1) use relative, instead of absolute, change, (2) check goodness of
fit, (3) see about fitting all components at once, instead of sequentially, (4) treat unknown
parameters in a Bayesian way instead of plug-in.
I showed a model for accomodating outliers. That model is explained in the book in Section 4.7.2.