#There are 11 questions below. Please work on these as homework for Wednesday. #The newproduct dataset (on course website) has data about 50 new product development #projects at a company that makes shampoo and other hair care products. The goal of the #dataset is to learn about what determines how long it takes to develop a new product. #The data consist of a response, time (days), and three covariates: cost (units of $100K), #FTEs (full time equivalent people), and an assessment of difficulty (higher is more #difficult). # download and read in the data from the course web page: data <- read.csv("~/newproduct.csv") # your "path" might be different. # Plsae use head(data) and tail(data) to make sure you have the data before continuing! # fit models with covariates in all 6 orders. fit.cFd <- lm(time~cost+FTEs+difficulty,data=data) fit.Fcd <- lm(time~FTEs+cost+difficulty,data=data) fit.dcF <- lm(time~difficulty+cost+FTEs,data=data) fit.cdF <- lm(time~cost+difficulty+FTEs,data=data) fit.Fdc <- lm(time~FTEs+difficulty+cost,data=data) fit.dFc <- lm(time~difficulty+FTEs+cost,data=data) # Below is an example of some code to visualize sequential sums of squares. # Please run it line by line to see what it does. anova.result <- anova(fit.dcF) labels <- c("SSR(difficulty)","SSR(cost|difficulty)","SSR(FTE|cost,difficulty)") barplot(as.matrix(anova.result[,2]),density=0,ylab="Sum of Squares") text(.67,anova.result[1,2]/2,labels[1]) text(.67,anova.result[1,2]+anova.result[2,2]/2,labels[2]) text(.67,anova.result[1,2]+anova.result[2,2]+anova.result[3,2]/2,labels[3]) text(.67,anova.result[1,2]+anova.result[2,2]+anova.result[3,2]+anova.result[4,2]/2,"SSE") # 1. What is the interpretation of the total height of the stacked bars? # 2. Modify the code above to show sequential sums of squares for fit.Fdc. (Please hand in your code and the figure.) # 3. Please run the following two lines and explain why they give the same number. var(data$time)*(49) sum(anova(fit.dcF)[,2]) # Use anova tables (or plots) to answer the following questions. # 4. What explains most of the variability in time by itself? How do you know? # 5. Does cost explain the variability in time? Does the answer depend on what else is in the model? How? # 6. Do FTEs explain the variability in time? Does the answer depend on what else is in the model? How? # 7. Does difficulty explain the variability in time? Does the answer depend on what else is in the model? How? # Use the results of the "summary" function to answer the following questions: # 8. What is the estimated function for the mean of time when all covariates are used? # 9. What is the estimated variance of the errors? # 10. What percentage of the variance in time is accounted for by the covariates? # 11. The line # "F-statistic: 74.34 on 3 and 46 DF, p-value: < 2.2e-16" # is at the end of summary(fit.dcF). # Show how the numbers in the anove table can be used to compute the 74.34. # Which two models does this F statistic compare? Which model is preferred and why? .