Question
Working with the file County1.csv that concerns unemployment in different counties in the United States. We will consider the variables HS (high school graduate percentage),
Working with the file County1.csv that concerns unemployment in different counties in the United States. We will consider the variables HS (high school graduate percentage), Bachelor (bachelor's degree percentage), and Poverty along with the binary variables (Reg1, Reg2, Reg3, Reg4) that represent four regions in the country to build a regression model to predict the Unemployment variable.
a. Run a multiple regression analysis to predict Unemployed as function of HS, Bachelor, Poverty, Reg1, Reg2, and Reg3. Call this model R1. Calculate R^2 and Adjusted R^2.
b. Use R1 to predict Unemployment for a county in region 2 with a high school graduation percentage of 83, a bachelor's degree percentage of 23, and Poverty level of 10. By how much would the previous prediction change if the county were in region 4 instead of region 2?
c. Test the variables for significance. If any variables are not significant remove them, one at a time, and run a new regression on the remaining variables. Continue this process until all variables are significant. Call this model R2. Calculate R^2 and Adjusted R^2.
d. Use R2 to predict Unemployment for a county in region 2 with a high school graduation percentage of 83, a bachelor's degree percentage of 23, and Poverty level of 10.
e. Run a multiple regression analysis to predict Unemployed as a function of HS, Bachelor, Poverty, Reg2, Reg3, and Reg4. Call this model R3. Calculate R^2 and Adjusted R^2.
f. Use R3 to predict Unemployment for a county in region 2 with a high school graduation percentage of 83, a bachelor's degree percentage of 23, and Poverty level of 10. By how much would the previous prediction change if the county were in region 4 instead of region 2?
g. Run a multiple regression analysis to predict Unemployed as a function of HS, Bachelor, Poverty, Reg1, Reg2, Reg3, and Reg4. What is wrong with this model?
h. Run a multiple regression analysis to predict Unemployed as a function of HS, Bachelor, and Poverty. Call this model R4. Calculate R^2 and Adjusted R^2.
i. Use R4 to predict Unemployment for a county in region 2 with a high school graduation percentage of 83, a bachelor's degree percentage of 23, and Poverty level of 10.
j. Perform the partial F-test on R1 and R4 to determine if, collectively, the regions are significant variables.
k. Use the file County2.csv., make predictions using R1, R2, R3, and R4 for all of the observations and calculate the root mean squared error for each of these predictions. Which of these models perform better? Why?
County1: https://docs.google.com/spreadsheets/d/1-JaxeP1ru-la8KUA3F9VDCjTvjUQfUNU8kYvA3GKnsE/edit?usp=sharing
County2: https://docs.google.com/spreadsheets/d/1A3jDRxdZErTUlvL5UpTukd4rgfljVrPzKsAwRWD6Gp8/edit?usp=sharing
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started