Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

These are .R type files.... to be used in RStudio or I prefer Minitab the lines are 1-54 with empty spaces numbered but blank. The

These are .R type files.... to be used in RStudio or I prefer Minitab the lines are 1-54 with empty spaces numbered but blank. The Data is in the next section below it is copied from excel. DO NOT post "Missing data: Reference", it's not...you don't know how this works or you're not familiar with .R files...that's all, skip it and let someone else do it.

Class Exercise .R file below:

# install packages if needed install.packages("ggplot2") # load packages if needed library(ggplot2) # read data initech <- read.csv(file.choose()) # select initech.csv # hisogram of variables hist(initech$years) hist(initech$salary) # scatterplot of variables ggplot(initech, aes(x = years, y = salary)) + geom_point() + theme_bw() + geom_smooth(method=lm, color="blue", fill="red") # Simple regression model: salary by years reg1Initech <- lm(salary ~ years, data=initech) # regression results summary(reg1Initech) # examine model fit plot(reg1Initech) # Add fit residuals and predictions to data frame initech$RegPred <- predict(reg1Initech) initech$RegResid <- residuals(reg1Initech) # residuals vs IV ggplot(initech, aes(x = years, y = RegResid)) + geom_point() + geom_abline(slope = 0, intercept = 0) + theme_bw() # examine prediction and confidence intervals temp_var <- predict(reg1Initech, interval="prediction") initech <- cbind(initech, temp_var) ggplot(initech, aes(years, salary))+ geom_point() + theme_classic() + geom_line(aes(y=lwr), color = "red", linetype = "dashed")+ geom_line(aes(y=upr), color = "red", linetype = "dashed")+ geom_smooth(method=lm, color="blue", fill="purple", se=TRUE) # make predictions use2predict <- data.frame(years=4) predict(reg1Initech, use2predict, interval="confid") # use "confid" for confidence interval, "predict" for prediction interval.

Data Set below from .csv file copied over from excel:

Years Salary
1 41504
1 32619
1 44322
2 40038
2 46147
2 38447
2 38163
3 42104
3 25597
3 39599
3 55698
4 47220
4 65929
4 55794
4 45959
5 52460
5 60308
5 61458
5 56951
6 56174
6 59363
6 57642
6 69792
7 59321
7 66379
7 64282
7 48901
8 100711
8 59324
8 54752
8 73619
9 65382
9 58823
9 65717
9 92816
9 72550
10 71365
10 88888
10 62969
10 45298
11 111292
11 91491
11 106345
11 99009
12 73981
12 72547
12 74991
12 139249
13 119948
13 128962
13 98112
13 97159
14 125246
14 89694
14 73333
14 108710
15 97567
15 90359
15 119806
15 101343
16 147406
16 153020
16 143200
16 97327
17 184807
17 146263
17 127925
17 159785
17 174822
18 177610
18 210984
18 160044
18 137044
19 182996
19 184183
19 168666
19 121350
20 193627
20 142611
20 170131
20 134140
21 129446
21 201469
21 202104
21 220556
22 166419
22 149044
22 247017
22 247730
23 252917
23 235517
23 241276
23 197229
24 175879
24 253682
24 262578
24 207715
25 221179
25 212028
25 312549

In the script file,lines 9-18 can be used to answer the first 3 questions below:

Question 1

Do the variables seem normally distributed?

  1. Yes
  2. No

Question 2

Which of the below issues do you see with the data (if any)?

  1. Non-linearity
  2. Heteroscedasticity
  3. Multicollinearity
  4. Non-independence of observations

Question 3

For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work best).

  1. Squaring or cubing salary
  2. Squaring or cubing years
  3. -1/salary
  4. -1/years
  5. sqrt(salary)
  6. sqrt(years)
  7. log(salary)
  8. log(years)

Next, run through the rest of the code to get answers to questions 4-9. You will need to make adjustments to lines 51-52 to answer questions 8 and 9.

Question 4

In Model 1, is the IV significant?

  1. Yes
  2. No

Question 5

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

______________%

Question 6

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

__________________________________________________________________________________

Question 7

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

______________________________________________________________________________

Question 8

What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level?

  1. 95% confident it's between $129,207.40 and $140,495.10
  2. It's exactly $134851.30
  3. 95% confident it's between $80,273.37 and $189,429.20

Question 9

What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level?

  1. 95% confident it's between $49,461.75 and $164,781.75
  2. It's exactly $57,121.75
  3. 95% confident it's between $2,298.69 and $111,944.80

Now run the regression of log(salary) by years. You will need to adjust the necessary code lines to change occurrences of years to log(years) to answer questions 10-15:

Question 10

Now run the regression of log(salary) by years. Is the IV significant?

  1. Yes
  2. No

Question 11

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

__________%

Question 12

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

___________________________________________________________________________________

Question 13

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

___________________________________________________________________________________

Question 14

What does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident it's between $75,390 and $82,024
  2. It's exactly $78,637
  3. We're 95% confident it's between $53,233 and $116,165

Question 15

What does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're confident it's between $88,538 and $95,754
  2. It's exactly $92,075
  3. We're confident $62,350 and $135,973

Now run the regression of log(salary) by sqrt(years) for questions 16-21:

Question 16

Now run the regression of log(salary) by sqrt(years). Is the IV significant?

  1. Yes
  2. No

Question 17

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

_______________%

Question 18

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

________________________________________________________________________________

Question 19

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

________________________________________________________________________________

Question 20

What does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it is between $51,176 and $58,035
  2. It's exactly $54,498
  3. We're 95% confident that it is between $35,535 and $83,580

Question 21

What does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident it's between $166,678 and $188,438
  2. It's exactly $177,225
  3. We're 95% confident that it's between $115,584 and $271,738

Now run the regression of salary by sqrt(years) for questions 22-28:

Question 22

Now run the regression of salary by sqrt(years). Is the IV significant?

  1. Yes
  2. No

Question 23

In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?

_____________%

Question 24

What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?

_____________________________________________________________________________________

Question 25

Do you notice anything amiss looking at the plot of the residuals vs the predicted values?

_____________________________________________________________________________________

Question 26

What does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it's between $157,927 and $175,108
  2. It's exactly $166,518
  3. We're 95% confident that it's between $100,357 and $232,678

Question 27

What does your model predict regarding the average salary of all employees with 4 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).

  1. We're 95% confident that it's between $30,503 and $52,198
  2. It's exactly $41,351
  3. We're 95% confident that it's between $0 (prediction is -$25,140) and $107842

Question 28

Which of the 4 models that you ran would you say performs the best and why?

__________________________________________________________________________________

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Linear Algebra A Modern Introduction

Authors: David Poole

3rd edition

9781133169574 , 978-0538735452

More Books

Students also viewed these Mathematics questions

Question

=+46. Monthly gas prices, part 3. Using the data from Exercise

Answered: 1 week ago