Question
Data 1 - Prestige Data We are using a mock dataset named Prestige to predict the average salary of Canadian occupations. The dataset consists of
Data 1 - Prestige Data
We are using a mock dataset named "Prestige" to predict the average salary of Canadian occupations. The dataset consists of the following variables:
income: average income (in $)
education: average education (in years)
women: percentage of women in the profession (%)
prestige: prestige score for occupation (numeric, continuous)
type: type of occupation (bcblue collar, wcwhite collar, profprofessional/managerial/technical)
NOTE:
You feel that the 'type' variable would be better broken down into only two types, bluecollar and white collar, where 'prof' would be categorized as white collar. Before beginning any analysis, recode the 'type' variable to reflect this change (i.e., all 'prof' occupations are reassigned to the 'wc' category).
1. You are interested in building a model to predict income, but first, you want to examine its distribution and determine whether a transformation is necessary.
a. Compute a numeric summary, histogram and boxplot for the income data. Describe the shape of the distribution.
b. Consider the log transformation. Create histogram and boxplot for this log-transformed data.
c. Do you suggest using the log-transformed income as the outcome variable in your model? Why or why not?
2. Next, you are interested in whether the effects of prestige on income depend on the type of occupation. Create appropriate graphic to check this. Based on this graphic, is an interaction between type and prestige worth including in a model to predict income?
3. Run a regression to predict income (i.e., using the outcome you chose in 1c) using all of the variables and an interaction between prestige and type (if you deemed it worth including in number 2 above). Copy and paste the regression output below.
a. Write sentence interpreting the effects of each of the variables on income (i.e., interpret the model coefficients).
b. What is the adjusted R2 value? Interpret this value.
c. Are there any potential outliers in your model? Look at the standardized residuals and discuss.
d. Examine the correlation matrix. Do you see any potential multicollinearity issues? Why?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started