# load some example data: data <- read.csv("~/outlierdata.csv") # fit a multiple regression fit <- lm(y~.,data=data,x=T) # (the x=T in the lm() above let's us get the X matrix from the fit.) # # 1. Based on the fit and the anova, please describe the relationship # between the covariates and the outcome. # 2. Plot the residuals with plot(resid(fit)). Do you see any obvious problems? # In class we defined the deleted residual, and said that it is: # d_i = r_i / (1-h_ii) where h_ii is the i'th diagnal element of # H = X(X'X)^{-1}X' # The code below computes the deleted residuals: r <- resid(fit) X <- fit$x H <- X%*%solve(t(X)%*%X)%*%t(X) h <- diag(H) d <- r/(1-h) # 3. Compute the 7th deleted residual directly (fit a regression with # all except the 7th observations call it fit.without.7: # i.e. fit.without.7 <- lm(y~.,data=data[-7,]) Use # predict(fit.without.7,newdata=data[7,]) to get the fit at 7. # Confirm that this can be used to get the same value as d[7]. # (Note the code above computes part of what you need to compute # the deleted residual. You need to finish it.) # 4. Plot the deleted residuals. Describe the problem that this plot reveals. # What can you do to resolve it? How does this change your answer to Q1?