Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Use R code to answer this question. Use the openintro package and the countyComplete data in that package. (a) In R fit a linear model

Use R code to answer this question. Use the openintro package and the countyComplete data in that package.

(a) In R fit a linear model to predict the median value of owner occupied home which we abbreviate as (MVH) from the median household income which we abbreviate (MHI). Print the summary of the model. (b) Interpret the slope you have estimated. (c) Calculate a 99% condence interval for the slope of MVH on MHI and interpret it in context of the data. (d) State the hypothesis you can test with this interval and provide the conclusion. (e) Plot the scatter plot of the data together with the regression line. If you saved your model in a variable mod you can type the command abline(mod) and you'll get the line. Plot it in a different color so that its more visible. (f) What is your prediction for the mean MVH for counties with MHI=50000? (g) Provide a 95% confidence interval for this prediction. (h) Run diagnostics residuals of the model (you can obtain them with the command residuals(mod)) using some plots and comment on whether you think the residuals are consistent with the assumptions of the model. (i) Sometimes a transformation of a variables yields data that is more consistent. Take the log transform of MVH, call it LMVH and regress that on MHI. (j) You will get an error message. Can you explain why? (k) Now redo the regression of LMVH on MHI without Hawaii Kalawao County. Everything should work. Why? (l) Plot the scatter plot of the data together with the regression line. (m) Run diagnostics on the residuals of the model using some plots and comment on whether you think the residuals are consistent with the assumptions of the model. (n) The linear model you fit is on the log(MVH). What relationship does that imply for the original MVH as a function of MHI? (o) Based on this model what would happen to the original MVH for an increase in 10000 dollars in MHI. (p) Now compute the means of LMVH and MHI for each State. You can do that using the dplyr package as follows: gg=group_by(countyComplete,state) ggm=summarise(gg,MHI=mean(median_household_income),LMVH=mean(log(median_val_owner_occupied)) The variable ggm should have 51 lines, one for each state. (q) Plot the full county scatter plot of LMVH on MHI and add to that the scatter plot of ggm$LMVH on ggm$MHI in red. What do you observe? (r) Regress the mean state LMVH on the mean state MHI. Note that R^2 of this new model is much larger than the R^2 of the model based on counties. Can you explain why? This improved linear t is sometimes call the ecological fallacy.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

2. Do the easy questions first.

Answered: 1 week ago

Question

classification of chemical hazards

Answered: 1 week ago