# load some example data:
data <- read.csv("~/outlierdata.csv")

# fit a multiple regression
fit <- lm(y~.,data=data,x=T) 
# (the x=T in the lm() above let's us get the X matrix from the fit.)
# 
# 1. Based on the fit and the anova, please describe the relationship
# between the covariates and the outcome.

# 2. Plot the residuals with plot(resid(fit)). Do you see any obvious problems?

# In class we defined the deleted residual, and said that it is:
# d_i = r_i / (1-h_ii) where h_ii is the i'th diagnal element of
# H = X(X'X)^{-1}X'
# The code below computes the deleted residuals:

r <- resid(fit)
X <- fit$x 
H <- X%*%solve(t(X)%*%X)%*%t(X)
h <- diag(H)
d <- r/(1-h)

# 3. Compute the 7th deleted residual directly (fit a regression with 
# all except the 7th observations call it fit.without.7:
# i.e. fit.without.7 <- lm(y~.,data=data[-7,]) Use 
# predict(fit.without.7,newdata=data[7,]) to get the fit at 7.
# Confirm that this can be used to get the same value as d[7].
# (Note the code above computes part of what you need to compute
# the deleted residual. You need to finish it.)

# 4. Plot the deleted residuals. Describe the problem that this plot reveals.
# What can you do to resolve it? How does this change your answer to Q1?