Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy

These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy of the file to use the data in R Studio. Any help in your replies (other than: missing reference) would be appreciated.

Class Exercise .R file:

  1. # install packages if needed
  2. install.packages("ggplot2")
  3. # load packages if needed
  4. library(ggplot2)
  5. # read data
  6. initech <- read.csv(file.choose()) # select initech.csv
  7. # hisogram of variables
  8. hist(initech$years)
  9. hist(initech$salary)
  10. # scatterplot of variables
  11. ggplot(initech, aes(x = years, y = salary)) +
  12. geom_point() +
  13. theme_bw() +
  14. geom_smooth(method=lm, color="blue",
  15. fill="red")
  16. # Simple regression model: salary by years
  17. reg1Initech <- lm(salary ~ years, data=initech)
  18. # regression results
  19. summary(reg1Initech)
  20. # examine model fit
  21. plot(reg1Initech)
  22. # Add fit residuals and predictions to data frame
  23. initech$RegPred <- predict(reg1Initech)
  24. initech$RegResid <- residuals(reg1Initech)
  25. # residuals vs IV
  26. ggplot(initech, aes(x = years, y = RegResid)) +
  27. geom_point() +
  28. geom_abline(slope = 0, intercept = 0) +
  29. theme_bw()
  30. # examine prediction and confidence intervals
  31. temp_var <- predict(reg1Initech, interval="prediction")
  32. initech <- cbind(initech, temp_var)
  33. ggplot(initech, aes(years, salary))+
  34. geom_point() +
  35. theme_classic() +
  36. geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
  37. geom_line(aes(y=upr), color = "red", linetype = "dashed")+
  38. geom_smooth(method=lm, color="blue",
  39. fill="purple", se=TRUE)
  40. # make predictions
  41. use2predict <- data.frame(years=4)
  42. predict(reg1Initech, use2predict, interval="confid") # use "confid" for confidence interval, "predict" for prediction interval.

This is the data:

These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy of the file to use the data in R Studio. Any help in your replies (other than: missing reference) would be appreciated.

Class Exercise .R file:

  1. # install packages if needed
  2. install.packages("ggplot2")
  3. # load packages if needed
  4. library(ggplot2)
  5. # read data
  6. initech <- read.csv(file.choose()) # select initech.csv
  7. # hisogram of variables
  8. hist(initech$years)
  9. hist(initech$salary)
  10. # scatterplot of variables
  11. ggplot(initech, aes(x = years, y = salary)) +
  12. geom_point() +
  13. theme_bw() +
  14. geom_smooth(method=lm, color="blue",
  15. fill="red")
  16. # Simple regression model: salary by years
  17. reg1Initech <- lm(salary ~ years, data=initech)
  18. # regression results
  19. summary(reg1Initech)
  20. # examine model fit
  21. plot(reg1Initech)
  22. # Add fit residuals and predictions to data frame
  23. initech$RegPred <- predict(reg1Initech)
  24. initech$RegResid <- residuals(reg1Initech)
  25. # residuals vs IV
  26. ggplot(initech, aes(x = years, y = RegResid)) +
  27. geom_point() +
  28. geom_abline(slope = 0, intercept = 0) +
  29. theme_bw()
  30. # examine prediction and confidence intervals
  31. temp_var <- predict(reg1Initech, interval="prediction")
  32. initech <- cbind(initech, temp_var)
  33. ggplot(initech, aes(years, salary))+
  34. geom_point() +
  35. theme_classic() +
  36. geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
  37. geom_line(aes(y=upr), color = "red", linetype = "dashed")+
  38. geom_smooth(method=lm, color="blue",
  39. fill="purple", se=TRUE)
  40. # make predictions
  41. use2predict <- data.frame(years=4)
  42. predict(reg1Initech, use2predict, interval="confid") # use "confid" for confidence interval, "predict" for prediction interval.

This is the Data below:

years salary 1 41504 1 32619 1 44322 2 40038 2 46147 2 38447 2 38163 3 42104 3 25597 3 39599 3 55698 4 47220 4 65929 4 55794 4 45959 5 52460 5 60308 5 61458 5 56951 6 56174 6 59363 6 57642 6 69792 7 59321 7 66379 7 64282 7 48901 8 100711 8 59324 8 54752 8 73619 9 65382 9 58823 9 65717 9 92816 9 72550 10 71365 10 88888 10 62969 10 45298 11 111292 11 91491 11 106345 11 99009 12 73981 12 72547 12 74991 12 139249 13 119948 13 128962 13 98112 13 97159 14 125246 14 89694 14 73333 14 108710 15 97567 15 90359 15 119806 15 101343 16 147406 16 153020 16 143200 16 97327 17 184807 17 146263 17 127925 17 159785 17 174822 18 177610 18 210984 18 160044 18 137044 19 182996 19 184183 19 168666 19 121350 20 193627 20 142611 20 170131 20 134140 21 129446 21 201469 21 202104 21 220556 22 166419 22 149044 22 247017 22 247730 23 252917 23 235517 23 241276 23 197229 24 175879 24 253682 24 262578 24 207715 25 221179 25 212028 25 312549

What you'll be doing is using the "Class Exercise.R" file to answer the questions below. In the script file,lines 9-18 can be used to answer the first 3 questions below. These are NOT the actual test question rather a guide to help answer them. 1. What issues, if any, do you see with the distribution of the two variables? 2. What issues, if any, do you see from the scatterplot of the two variables? 3. For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work for you).

Next, run through the rest of the code to get answers to questions 4-9. You will need to make adjustments to lines 51-52 to answer questions 8 and 9. Model 1 4. Run the regression of salary by years. Is the IV significant? 5. How much of the variability of the DV is explained by the IV? 6. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 7. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 8. What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level? 9. What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level? Now you're going to try some other models, per the instructions below. However, to create the scatterplot of the residuals against predicted, we appended the original data with the predictions, residuals, and prediction intervals. To repeat this process with transformed data we need to start from scratch because we don't want multiple columns in our data object with the same names (ie, each time we run the code it will store a fit and a resid and a lwr and upr bounds. We don't want that so write down your responses to the above questions then start over at line 8 before proceeding to answer 10- 15 below.

Model 2 10. Now run the regression of log(salary) by years. You will need to adjust the necessary code lines to change occurrences of years to log(years). Is the IV significant? 11. How much of the variability of the DV is explained by the IV? 12. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 13. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 14. With this latest model, what does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level?

15. With this latest model, what does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? Write down your responses to the above questions then start over at line 8 before proceeding to answer 16-21 below.

Model 3 16. Now run the regression of log(salary) by sqrt(years). Is the IV significant? 17. How much of the variability of the DV is explained by the IV? 18. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 19. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 20. With this latest model, what does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? 21. With this latest model, what does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? Write down your responses to the above questions then start over at line 8 before proceeding to answer 21-27 below.

Model 4 22. Now run the regression of salary by sqrt(years). Is the IV significant? 23. How much of the variability of the DV is explained by the IV? 24. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 25. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 26. With this latest model, what does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? 27. With this latest model, what does your model predict regarding the salary of a single employee with 4 years of experience at the 95% confidence level? 28. Finally, which of the 4 models that you ran would you say performs the best and why?

These are the actual test questions below:

Question 1

Do the variables seem normally distributed?

  1. Yes
  2. No

Question 2

Which of the below issues do you see with the data (if any)?

  1. Non-linearity
  2. Heteroscedasticity
  3. Multicollinearity
  4. Non-independence of observations

Question 3

For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work best).

  1. Squaring or cubing salary
  2. Squaring or cubing years
  3. -1/salary
  4. -1/years
  5. sqrt(salary)
  6. sqrt(years)
  7. log(salary)
  8. log(years)

Question 4

In Model 1, is the IV significant?

  1. Yes
  2. No

Question 5

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

______________%

Question 6

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

__________________________________________________________________________________

Question 7

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

______________________________________________________________________________

Question 8

What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level?

  1. 95% confident it's between $129,207.40 and $140,495.10
  2. It's exactly $134851.30
  3. 95% confident it's between $80,273.37 and $189,429.20

Question 9

What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level?

  1. 95% confident it's between $49,461.75 and $164,781.75
  2. It's exactly $57,121.75
  3. 95% confident it's between $2,298.69 and $111,944.80

Question 10

Now run the regression of log(salary) by years. Is the IV significant?

  1. Yes
  2. No

Question 11

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

__________%

Question 12

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

___________________________________________________________________________________

Question 13

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

___________________________________________________________________________________

Question 14

What does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident it's between $75,390 and $82,024
  2. It's exactly $78,637
  3. We're 95% confident it's between $53,233 and $116,165

Question 15

What does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're confident it's between $88,538 and $95,754
  2. It's exactly $92,075
  3. We're confident $62,350 and $135,973

Question 16

Now run the regression of log(salary) by sqrt(years). Is the IV significant?

  1. Yes
  2. No

Question 17

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

_______________%

Question 18

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

________________________________________________________________________________

Question 19

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

________________________________________________________________________________

Question 20

What does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it is between $51,176 and $58,035
  2. It's exactly $54,498
  3. We're 95% confident that it is between $35,535 and $83,580

Question 21

What does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident it's between $166,678 and $188,438
  2. It's exactly $177,225
  3. We're 95% confident that it's between $115,584 and $271,738

Question 22

Now run the regression of salary by sqrt(years). Is the IV significant?

  1. Yes
  2. No

Question 23

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

_____________%

Question 24

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

_____________________________________________________________________________________

Question 25

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

_____________________________________________________________________________________

Question 26

What does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it's between $157,927 and $175,108
  2. It's exactly $166,518
  3. We're 95% confident that it's between $100,357 and $232,678

Question 27

What does your model predict regarding the average salary of all employees with 4 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it's between $30,503 and $52,198
  2. It's exactly $41,351
  3. We're 95% confident that it's between $0 (prediction is -$25,140) and $107842

Question 28

Which of the 4 models that you ran would you say performs the best and why?

__________________________________________________________________________________

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

A First Course in Differential Equations with Modeling Applications

Authors: Dennis G. Zill

11th edition

1305965728, 978-1305965720

More Books

Students also viewed these Mathematics questions

Question

6. Which of the above effects has the largest magnitude of effect?

Answered: 1 week ago

Question

2. What are the IVs and DV?

Answered: 1 week ago