Question
These are .R type files.... to be used in RStudio or I prefer Minitab the lines are 1-54 with empty spaces numbered but blank. The
These are .R type files.... to be used in RStudio or I prefer Minitab the lines are 1-54 with empty spaces numbered but blank. The Data is in the next section below it is copied from excel. DO NOT post "Missing data: Reference", it's not...you don't know how this works or you're not familiar with .R files...that's all, skip it and let someone else do it.
Class Exercise .R file below:
# install packages if needed install.packages("ggplot2") # load packages if needed library(ggplot2) # read data initech <- read.csv(file.choose()) # select initech.csv # hisogram of variables hist(initech$years) hist(initech$salary) # scatterplot of variables ggplot(initech, aes(x = years, y = salary)) + geom_point() + theme_bw() + geom_smooth(method=lm, color="blue", fill="red") # Simple regression model: salary by years reg1Initech <- lm(salary ~ years, data=initech) # regression results summary(reg1Initech) # examine model fit plot(reg1Initech) # Add fit residuals and predictions to data frame initech$RegPred <- predict(reg1Initech) initech$RegResid <- residuals(reg1Initech) # residuals vs IV ggplot(initech, aes(x = years, y = RegResid)) + geom_point() + geom_abline(slope = 0, intercept = 0) + theme_bw() # examine prediction and confidence intervals temp_var <- predict(reg1Initech, interval="prediction") initech <- cbind(initech, temp_var) ggplot(initech, aes(years, salary))+ geom_point() + theme_classic() + geom_line(aes(y=lwr), color = "red", linetype = "dashed")+ geom_line(aes(y=upr), color = "red", linetype = "dashed")+ geom_smooth(method=lm, color="blue", fill="purple", se=TRUE) # make predictions use2predict <- data.frame(years=4) predict(reg1Initech, use2predict, interval="confid") # use "confid" for confidence interval, "predict" for prediction interval.
Data Set below from .csv file copied over from excel:
Years | Salary |
1 | 41504 |
1 | 32619 |
1 | 44322 |
2 | 40038 |
2 | 46147 |
2 | 38447 |
2 | 38163 |
3 | 42104 |
3 | 25597 |
3 | 39599 |
3 | 55698 |
4 | 47220 |
4 | 65929 |
4 | 55794 |
4 | 45959 |
5 | 52460 |
5 | 60308 |
5 | 61458 |
5 | 56951 |
6 | 56174 |
6 | 59363 |
6 | 57642 |
6 | 69792 |
7 | 59321 |
7 | 66379 |
7 | 64282 |
7 | 48901 |
8 | 100711 |
8 | 59324 |
8 | 54752 |
8 | 73619 |
9 | 65382 |
9 | 58823 |
9 | 65717 |
9 | 92816 |
9 | 72550 |
10 | 71365 |
10 | 88888 |
10 | 62969 |
10 | 45298 |
11 | 111292 |
11 | 91491 |
11 | 106345 |
11 | 99009 |
12 | 73981 |
12 | 72547 |
12 | 74991 |
12 | 139249 |
13 | 119948 |
13 | 128962 |
13 | 98112 |
13 | 97159 |
14 | 125246 |
14 | 89694 |
14 | 73333 |
14 | 108710 |
15 | 97567 |
15 | 90359 |
15 | 119806 |
15 | 101343 |
16 | 147406 |
16 | 153020 |
16 | 143200 |
16 | 97327 |
17 | 184807 |
17 | 146263 |
17 | 127925 |
17 | 159785 |
17 | 174822 |
18 | 177610 |
18 | 210984 |
18 | 160044 |
18 | 137044 |
19 | 182996 |
19 | 184183 |
19 | 168666 |
19 | 121350 |
20 | 193627 |
20 | 142611 |
20 | 170131 |
20 | 134140 |
21 | 129446 |
21 | 201469 |
21 | 202104 |
21 | 220556 |
22 | 166419 |
22 | 149044 |
22 | 247017 |
22 | 247730 |
23 | 252917 |
23 | 235517 |
23 | 241276 |
23 | 197229 |
24 | 175879 |
24 | 253682 |
24 | 262578 |
24 | 207715 |
25 | 221179 |
25 | 212028 |
25 | 312549 |
In the script file,lines 9-18 can be used to answer the first 3 questions below:
Question 1
Do the variables seem normally distributed?
- Yes
- No
Question 2
Which of the below issues do you see with the data (if any)?
- Non-linearity
- Heteroscedasticity
- Multicollinearity
- Non-independence of observations
Question 3
For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work best).
- Squaring or cubing salary
- Squaring or cubing years
- -1/salary
- -1/years
- sqrt(salary)
- sqrt(years)
- log(salary)
- log(years)
Next, run through the rest of the code to get answers to questions 4-9. You will need to make adjustments to lines 51-52 to answer questions 8 and 9.
Question 4
In Model 1, is the IV significant?
- Yes
- No
Question 5
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
______________%
Question 6
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
__________________________________________________________________________________
Question 7
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
______________________________________________________________________________
Question 8
What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level?
- 95% confident it's between $129,207.40 and $140,495.10
- It's exactly $134851.30
- 95% confident it's between $80,273.37 and $189,429.20
Question 9
What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level?
- 95% confident it's between $49,461.75 and $164,781.75
- It's exactly $57,121.75
- 95% confident it's between $2,298.69 and $111,944.80
Now run the regression of log(salary) by years. You will need to adjust the necessary code lines to change occurrences of years to log(years) to answer questions 10-15:
Question 10
Now run the regression of log(salary) by years. Is the IV significant?
- Yes
- No
Question 11
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
__________%
Question 12
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
___________________________________________________________________________________
Question 13
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
___________________________________________________________________________________
Question 14
What does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident it's between $75,390 and $82,024
- It's exactly $78,637
- We're 95% confident it's between $53,233 and $116,165
Question 15
What does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're confident it's between $88,538 and $95,754
- It's exactly $92,075
- We're confident $62,350 and $135,973
Now run the regression of log(salary) by sqrt(years) for questions 16-21:
Question 16
Now run the regression of log(salary) by sqrt(years). Is the IV significant?
- Yes
- No
Question 17
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
_______________%
Question 18
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
________________________________________________________________________________
Question 19
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
________________________________________________________________________________
Question 20
What does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident that it is between $51,176 and $58,035
- It's exactly $54,498
- We're 95% confident that it is between $35,535 and $83,580
Question 21
What does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident it's between $166,678 and $188,438
- It's exactly $177,225
- We're 95% confident that it's between $115,584 and $271,738
Now run the regression of salary by sqrt(years) for questions 22-28:
Question 22
Now run the regression of salary by sqrt(years). Is the IV significant?
- Yes
- No
Question 23
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
_____________%
Question 24
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
_____________________________________________________________________________________
Question 25
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
_____________________________________________________________________________________
Question 26
What does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident that it's between $157,927 and $175,108
- It's exactly $166,518
- We're 95% confident that it's between $100,357 and $232,678
Question 27
What does your model predict regarding the average salary of all employees with 4 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident that it's between $30,503 and $52,198
- It's exactly $41,351
- We're 95% confident that it's between $0 (prediction is -$25,140) and $107842
Question 28
Which of the 4 models that you ran would you say performs the best and why?
__________________________________________________________________________________
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started