Stat 597A Introduction to Statistical
Computing
John
Staudenmayer (Office LGRT 1435K, Phone 545 0999)
Office
hours: Wed and Fri 11AM, or by appointment.
jstauden
at math.umass.edu
www.math.umass.edu/~jstauden/stat597A.html
Textbook
The Art of R Programming (Norman Matloff) We
will use the book a lot. Additional readings will be assigned during the
semester.
Software
& Prerequisites
This class will mostly be about using R to do modern
statistical analyses and data management. (You need a computer with R on it! R
studio is strongly recommended.) We will learn to use R to run existing
programs, and we will also learn to write our own programs and functions. Effective
programming strategies and principles will be emphasized.
This class does not assume a lot of programming
background. We will cover core of ideas of programming — functions,
objects, data structures, flow control, input and output, debugging, logical
design and abstraction — through writing code to assist in numerical and
graphical statistical analyses. If you know a lot about programming already,
much of the class may be review. The class will assume that students know the
basic concepts of statistical thinking (with data) and basic probability.
Classes
& Assignments
Classes will consist of lectures and labs. A
tentative schedule and topic list is below. I will try to post the lecture notes
on the web after class. The labs will consist of group activities done on your
computers during class time. Each lab will have an assignment to be handed in.
There will also be approximately weekly problem sets.
All assignments must be turned in electronically,
via email. All assignments will involve writing a combination of
code and actual prose. You must submit your assignment in a format that allows
for the combination of the two, and the automatic execution of all your code.
The easiest way to do this is to use R Markdown (http://rmarkdown.rstudio.com).
Projects
There
will be a midterm project (assigned 10/23 and due 10/30) and a final project
(assigned 11/16 and due before end of final exams). The midterm will be done
alone, and the final project will be done in small assigned groups. The last few
classes will consist of project presentations.
Grading
Approximately
weekly problem sets and labs: 30%
Midterm
Project: 30%
Final
Project: 40%
Some R Resources
There are many online resources for learning about it and working with it,
in addition to the textbook:
¥
The official
intro, "An Introduction to R", available online in HTML and PDF
¥
John Verzani,
"simpleR", in PDF
¥
Google R Style Guide offers some rules for naming, spacing, etc., which
are generally good ideas
¥
Quick-R. This is primarily aimed at those who already know
a commercial statistics package like SAS, SPSS or Stata, but it's very clear
and well-organized, and others may find it useful as well.
¥
Patrick Burns,
The R Inferno. "If you are using R and
you think you're in hell, this is a map for you."
¥
Thomas Lumley,
"R Fundamentals and Programming Techniques" (large PDF)
¥
The website Software
Carpentry is not specifically R related, but contains a lot of
valuable advice and information on scientific programming.
Important Note
A
lot of this class (and this information!) is based on a class at Carnegie
Mellon:
Shalizi,
C. R. and Thomas, A. C. (2014), "Statistical Computing 36-350: Beginning
to Advanced Techniques in R",
http://www.stat.cmu.edu/~cshalizi/statcomp/14
Tentative Schedule / Class Outline
Data types and data structures
Lecture
1 (Sept 9): Simple data types and structures
Lecture
2 (Sept 11): Bigger data structures
Lab 1 R Markdown File (Sept 14)
Flow control and looping
Lecture
3 (Sept 16): Data Frames and Control
Text
Lecture
4 (Sept 21): Text basics
Lecture
5 (Sept. 23): Regular expressions
Writing and calling functions
Lecture
6 (Sept 28): Writing functions
Lecture
7 (Sept 30): Multiple functions
Data from elsewhere
Lecture
8 (Oct 7): Getting data
Fitting and using statistical models
Lecture
10 (Oct 13): Random number generation
Changing Shapes
Lecture
11 (Oct 19): Timing code and a start at apply() functions
Lecture
12 (Oct 21): More with the apply family
Mid-semester
project due Oct 30
Functions of functions, and optimization
Lecture
15 (Nov 4): Functions as objects
More optimization