Answered step by step
Verified Expert Solution
Link Copied!
Question
1 Approved Answer

These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy

These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy of the file to use the data in R Studio. Any help in your replies (other than: missing reference) would be appreciated.

Class Exercise .R file:

  1. # install packages if needed
  2. install.packages("ggplot2")
  3. # load packages if needed
  4. library(ggplot2)
  5. # read data
  6. initech <- read.csv(file.choose()) # select initech.csv
  7. # hisogram of variables
  8. hist(initech$years)
  9. hist(initech$salary)
  10. # scatterplot of variables
  11. ggplot(initech, aes(x = years, y = salary)) +
  12. geom_point() +
  13. theme_bw() +
  14. geom_smooth(method=lm, color="blue",
  15. fill="red")
  16. # Simple regression model: salary by years
  17. reg1Initech <- lm(salary ~ years, data=initech)
  18. # regression results
  19. summary(reg1Initech)
  20. # examine model fit
  21. plot(reg1Initech)
  22. # Add fit residuals and predictions to data frame
  23. initech$RegPred <- predict(reg1Initech)
  24. initech$RegResid <- residuals(reg1Initech)
  25. # residuals vs IV
  26. ggplot(initech, aes(x = years, y = RegResid)) +
  27. geom_point() +
  28. geom_abline(slope = 0, intercept = 0) +
  29. theme_bw()
  30. # examine prediction and confidence intervals
  31. temp_var <- predict(reg1Initech, interval="prediction")
  32. initech <- cbind(initech, temp_var)
  33. ggplot(initech, aes(years, salary))+
  34. geom_point() +
  35. theme_classic() +
  36. geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
  37. geom_line(aes(y=upr), color = "red", linetype = "dashed")+
  38. geom_smooth(method=lm, color="blue",
  39. fill="purple", se=TRUE)
  40. # make predictions
  41. use2predict <- data.frame(years=4)
  42. predict(reg1Initech, use2predict, interval="confid") # use "confid" for confidence interval, "predict" for prediction interval.

What you'll be doing is using the "Class Exercise.R" file to answer the questions below. In the script file,lines 9-18 can be used to answer the first 3 questions below. These are NOT the actual test question rather a guide to help answer them. 1. What issues, if any, do you see with the distribution of the two variables? 2. What issues, if any, do you see from the scatterplot of the two variables? 3. For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work for you).

Next, run through the rest of the code to get answers to questions 4-9. You will need to make adjustments to lines 51-52 to answer questions 8 and 9. Model 1 4. Run the regression of salary by years. Is the IV significant? 5. How much of the variability of the DV is explained by the IV? 6. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 7. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 8. What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level? 9. What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level? Now you're going to try some other models, per the instructions below. However, to create the scatterplot of the residuals against predicted, we appended the original data with the predictions, residuals, and prediction intervals. To repeat this process with transformed data we need to start from scratch because we don't want multiple columns in our data object with the same names (ie, each time we run the code it will store a fit and a resid and a lwr and upr bounds. We don't want that so write down your responses to the above questions then start over at line 8 before proceeding to answer 10- 15 below.

Model 2 10. Now run the regression of log(salary) by years. You will need to adjust the necessary code lines to change occurrences of years to log(years). Is the IV significant? 11. How much of the variability of the DV is explained by the IV? 12. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 13. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 14. With this latest model, what does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level?

15. With this latest model, what does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? Write down your responses to the above questions then start over at line 8 before proceeding to answer 16-21 below.

Model 3 16. Now run the regression of log(salary) by sqrt(years). Is the IV significant? 17. How much of the variability of the DV is explained by the IV? 18. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 19. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 20. With this latest model, what does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? 21. With this latest model, what does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? Write down your responses to the above questions then start over at line 8 before proceeding to answer 21-27 below.

Model 4 22. Now run the regression of salary by sqrt(years). Is the IV significant? 23. How much of the variability of the DV is explained by the IV? 24. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 25. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 26. With this latest model, what does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? 27. With this latest model, what does your model predict regarding the salary of a single employee with 4 years of experience at the 95% confidence level? 28. Finally, which of the 4 models that you ran would you say performs the best and why?

These are the actual test questions below:

Question 1

Do the variables seem normally distributed?

  1. Yes
  2. No

Question 2

Which of the below issues do you see with the data (if any)?

  1. Non-linearity
  2. Heteroscedasticity
  3. Multicollinearity
  4. Non-independence of observations

Question 3

For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work best).

  1. Squaring or cubing salary
  2. Squaring or cubing years
  3. -1/salary
  4. -1/years
  5. sqrt(salary)
  6. sqrt(years)
  7. log(salary)
  8. log(years)

Question 4

In Model 1, is the IV significant?

  1. Yes
  2. No

Question 5

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

______________%

Question 6

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

__________________________________________________________________________________

Question 7

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

______________________________________________________________________________

Question 8

What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level?

  1. 95% confident it's between $129,207.40 and $140,495.10
  2. It's exactly $134851.30
  3. 95% confident it's between $80,273.37 and $189,429.20

Question 9

What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level?

  1. 95% confident it's between $49,461.75 and $164,781.75
  2. It's exactly $57,121.75
  3. 95% confident it's between $2,298.69 and $111,944.80

Question 10

Now run the regression of log(salary) by years. Is the IV significant?

  1. Yes
  2. No

Question 11

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

__________%

Question 12

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

___________________________________________________________________________________

Question 13

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

___________________________________________________________________________________

Question 14

What does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident it's between $75,390 and $82,024
  2. It's exactly $78,637
  3. We're 95% confident it's between $53,233 and $116,165

Question 15

What does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're confident it's between $88,538 and $95,754
  2. It's exactly $92,075
  3. We're confident $62,350 and $135,973

Question 16

Now run the regression of log(salary) by sqrt(years). Is the IV significant?

  1. Yes
  2. No

Question 17

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

_______________%

Question 18

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

________________________________________________________________________________

Question 19

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

________________________________________________________________________________

Question 20

What does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it is between $51,176 and $58,035
  2. It's exactly $54,498
  3. We're 95% confident that it is between $35,535 and $83,580

Question 21

What does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident it's between $166,678 and $188,438
  2. It's exactly $177,225
  3. We're 95% confident that it's between $115,584 and $271,738

Question 22

Now run the regression of salary by sqrt(years). Is the IV significant?

  1. Yes
  2. No

Question 23

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

_____________%

Question 24

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

_____________________________________________________________________________________

Question 25

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

_____________________________________________________________________________________

Question 26

What does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it's between $157,927 and $175,108
  2. It's exactly $166,518
  3. We're 95% confident that it's between $100,357 and $232,678

Question 27

What does your model predict regarding the average salary of all employees with 4 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it's between $30,503 and $52,198
  2. It's exactly $41,351
  3. We're 95% confident that it's between $0 (prediction is -$25,140) and $107842

Question 28

Which of the 4 models that you ran would you say performs the best and why?

__________________________________________________________________________________

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image
Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mathematical Interest Theory

Authors: Leslie Jane, James Daniel, Federer Vaaler

3rd Edition

147046568X, 978-1470465681

More Books

Students explore these related Mathematics questions