Question
Use R code to answer this question. Use the openintro package and the countyComplete data in that package. (a) In R fit a linear model
Use R code to answer this question. Use the openintro package and the countyComplete data in that package.
(a) In R fit a linear model to predict the median value of owner occupied home which we abbreviate as (MVH) from the median household income which we abbreviate (MHI). Print the summary of the model. (b) Interpret the slope you have estimated. (c) Calculate a 99% condence interval for the slope of MVH on MHI and interpret it in context of the data. (d) State the hypothesis you can test with this interval and provide the conclusion. (e) Plot the scatter plot of the data together with the regression line. If you saved your model in a variable mod you can type the command abline(mod) and you'll get the line. Plot it in a different color so that its more visible. (f) What is your prediction for the mean MVH for counties with MHI=50000? (g) Provide a 95% confidence interval for this prediction. (h) Run diagnostics residuals of the model (you can obtain them with the command residuals(mod)) using some plots and comment on whether you think the residuals are consistent with the assumptions of the model. (i) Sometimes a transformation of a variables yields data that is more consistent. Take the log transform of MVH, call it LMVH and regress that on MHI. (j) You will get an error message. Can you explain why? (k) Now redo the regression of LMVH on MHI without Hawaii Kalawao County. Everything should work. Why? (l) Plot the scatter plot of the data together with the regression line. (m) Run diagnostics on the residuals of the model using some plots and comment on whether you think the residuals are consistent with the assumptions of the model. (n) The linear model you fit is on the log(MVH). What relationship does that imply for the original MVH as a function of MHI? (o) Based on this model what would happen to the original MVH for an increase in 10000 dollars in MHI. (p) Now compute the means of LMVH and MHI for each State. You can do that using the dplyr package as follows: gg=group_by(countyComplete,state) ggm=summarise(gg,MHI=mean(median_household_income),LMVH=mean(log(median_val_owner_occupied)) The variable ggm should have 51 lines, one for each state. (q) Plot the full county scatter plot of LMVH on MHI and add to that the scatter plot of ggm$LMVH on ggm$MHI in red. What do you observe? (r) Regress the mean state LMVH on the mean state MHI. Note that R^2 of this new model is much larger than the R^2 of the model based on counties. Can you explain why? This improved linear t is sometimes call the ecological fallacy.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started