Question
These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy
These are .R type files.... other than taking the data and copying it from below, I'm not sure how I can give you a copy of the file to use the data in R Studio. Any help in your replies (other than: missing reference) would be appreciated.
Class Exercise .R file:
- # install packages if needed
- install.packages("ggplot2")
- # load packages if needed
- library(ggplot2)
- # read data
- initech <- read.csv(file.choose()) # select initech.csv
- # hisogram of variables
- hist(initech$years)
- hist(initech$salary)
- # scatterplot of variables
- ggplot(initech, aes(x = years, y = salary)) +
- geom_point() +
- theme_bw() +
- geom_smooth(method=lm, color="blue",
- fill="red")
- # Simple regression model: salary by years
- reg1Initech <- lm(salary ~ years, data=initech)
- # regression results
- summary(reg1Initech)
- # examine model fit
- plot(reg1Initech)
- # Add fit residuals and predictions to data frame
- initech$RegPred <- predict(reg1Initech)
- initech$RegResid <- residuals(reg1Initech)
- # residuals vs IV
- ggplot(initech, aes(x = years, y = RegResid)) +
- geom_point() +
- geom_abline(slope = 0, intercept = 0) +
- theme_bw()
- # examine prediction and confidence intervals
- temp_var <- predict(reg1Initech, interval="prediction")
- initech <- cbind(initech, temp_var)
- ggplot(initech, aes(years, salary))+
- geom_point() +
- theme_classic() +
- geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
- geom_line(aes(y=upr), color = "red", linetype = "dashed")+
- geom_smooth(method=lm, color="blue",
- fill="purple", se=TRUE)
- # make predictions
- use2predict <- data.frame(years=4)
- predict(reg1Initech, use2predict, interval="confid") # use "confid" for confidence interval, "predict" for prediction interval.
What you'll be doing is using the "Class Exercise.R" file to answer the questions below. In the script file,lines 9-18 can be used to answer the first 3 questions below. These are NOT the actual test question rather a guide to help answer them. 1. What issues, if any, do you see with the distribution of the two variables? 2. What issues, if any, do you see from the scatterplot of the two variables? 3. For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work for you).
Next, run through the rest of the code to get answers to questions 4-9. You will need to make adjustments to lines 51-52 to answer questions 8 and 9. Model 1 4. Run the regression of salary by years. Is the IV significant? 5. How much of the variability of the DV is explained by the IV? 6. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 7. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 8. What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level? 9. What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level? Now you're going to try some other models, per the instructions below. However, to create the scatterplot of the residuals against predicted, we appended the original data with the predictions, residuals, and prediction intervals. To repeat this process with transformed data we need to start from scratch because we don't want multiple columns in our data object with the same names (ie, each time we run the code it will store a fit and a resid and a lwr and upr bounds. We don't want that so write down your responses to the above questions then start over at line 8 before proceeding to answer 10- 15 below.
Model 2 10. Now run the regression of log(salary) by years. You will need to adjust the necessary code lines to change occurrences of years to log(years). Is the IV significant? 11. How much of the variability of the DV is explained by the IV? 12. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 13. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 14. With this latest model, what does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level?
15. With this latest model, what does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? Write down your responses to the above questions then start over at line 8 before proceeding to answer 16-21 below.
Model 3 16. Now run the regression of log(salary) by sqrt(years). Is the IV significant? 17. How much of the variability of the DV is explained by the IV? 18. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 19. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 20. With this latest model, what does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? 21. With this latest model, what does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? Write down your responses to the above questions then start over at line 8 before proceeding to answer 21-27 below.
Model 4 22. Now run the regression of salary by sqrt(years). Is the IV significant? 23. How much of the variability of the DV is explained by the IV? 24. What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV? 25. Do you notice anything amiss looking at the plot of the residuals vs the predicted values? 26. With this latest model, what does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? 27. With this latest model, what does your model predict regarding the salary of a single employee with 4 years of experience at the 95% confidence level? 28. Finally, which of the 4 models that you ran would you say performs the best and why?
These are the actual test questions below:
Question 1
Do the variables seem normally distributed?
- Yes
- No
Question 2
Which of the below issues do you see with the data (if any)?
- Non-linearity
- Heteroscedasticity
- Multicollinearity
- Non-independence of observations
Question 3
For any identified in the two responses above, how would you correct them? (try some different transformations and see what seems to work best).
- Squaring or cubing salary
- Squaring or cubing years
- -1/salary
- -1/years
- sqrt(salary)
- sqrt(years)
- log(salary)
- log(years)
Question 4
In Model 1, is the IV significant?
- Yes
- No
Question 5
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
______________%
Question 6
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
__________________________________________________________________________________
Question 7
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
______________________________________________________________________________
Question 8
What does your model predict regarding the average salary of all employees with 15 years of experience at the 95% confidence level?
- 95% confident it's between $129,207.40 and $140,495.10
- It's exactly $134851.30
- 95% confident it's between $80,273.37 and $189,429.20
Question 9
What does your model predict regarding the salary of a single employee with 6 years of experience at the 95% confidence level?
- 95% confident it's between $49,461.75 and $164,781.75
- It's exactly $57,121.75
- 95% confident it's between $2,298.69 and $111,944.80
Question 10
Now run the regression of log(salary) by years. Is the IV significant?
- Yes
- No
Question 11
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
__________%
Question 12
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
___________________________________________________________________________________
Question 13
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
___________________________________________________________________________________
Question 14
What does your model predict regarding the average salary of all employees with 10 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident it's between $75,390 and $82,024
- It's exactly $78,637
- We're 95% confident it's between $53,233 and $116,165
Question 15
What does your model predict regarding the salary of a single employee with 12 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're confident it's between $88,538 and $95,754
- It's exactly $92,075
- We're confident $62,350 and $135,973
Question 16
Now run the regression of log(salary) by sqrt(years). Is the IV significant?
- Yes
- No
Question 17
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
_______________%
Question 18
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
________________________________________________________________________________
Question 19
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
________________________________________________________________________________
Question 20
What does your model predict regarding the average salary of all employees with 5 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident that it is between $51,176 and $58,035
- It's exactly $54,498
- We're 95% confident that it is between $35,535 and $83,580
Question 21
What does your model predict regarding the salary of a single employee with 21 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident it's between $166,678 and $188,438
- It's exactly $177,225
- We're 95% confident that it's between $115,584 and $271,738
Question 22
Now run the regression of salary by sqrt(years). Is the IV significant?
- Yes
- No
Question 23
In reporting the results of Model 1, how much of the variance of salary would we say was accounted for by the model (report as a percentage rounded to 1 decimal place)?
_____________%
Question 24
What do you notice after looking at the scatterplot plus the regression line and confidence intervals? Does it look like the model performs equally well across all values of the IV?
_____________________________________________________________________________________
Question 25
Do you notice anything amiss looking at the plot of the residuals vs the predicted values?
_____________________________________________________________________________________
Question 26
What does your model predict regarding the average salary of all employees with 19 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident that it's between $157,927 and $175,108
- It's exactly $166,518
- We're 95% confident that it's between $100,357 and $232,678
Question 27
What does your model predict regarding the average salary of all employees with 4 years of experience at the 95% confidence level? (note: order of operations and rounding could lead to slight variations in your answers. Pick the answer below that most closely matches your finding).
- We're 95% confident that it's between $30,503 and $52,198
- It's exactly $41,351
- We're 95% confident that it's between $0 (prediction is -$25,140) and $107842
Question 28
Which of the 4 models that you ran would you say performs the best and why?
__________________________________________________________________________________
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started