Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

D _ Question 1 In Question 1, we use a data set, Rent_Data.txtb) Using 'confint()' function, obtain 95% confidence intervals of the coefficient of each

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
D _ Question 1 In Question 1, we use a data set, \"Rent_Data.txt\b) Using 'confint()' function, obtain 95% confidence intervals of the coefficient of each model. # R-code: confint() (run line 31 on the R-file, modify the line 31 for model 2 and 3) [Insert R-output] [Write your answer in the table below] Coefficient Confidence interval Test Hypothesis Predictors (Lower CI, Upper CI) Is '0' inside of CI Decision Distance from.Airport - 1 - Distance.to. . .Downtown Distance to.University 4. When we build the regression model to predict outcome variable, it is very important to check the assumptions. Without verifying that the model has satisfied the assumptions, the results from the model may be misleading. Discuss about the assumptions we need to consider after fitting the mode to verify the model (Which assumptions we need to check, how to check the assumptions, when the assumptions are violated, and how can we resolve the violations). [Write your answer here] 5. Check the assumptions of the models using plot() and gylma(). Is there evidence of outliers or high leverage observations in the models? If so, please inspect the outliers. # R-code: plot(), influencePlot(), gvlma() (run lines 38-45 on the R-file, modify the codes for the model 2 and 3) [Insert R-5. Check the assumptions of the models using ) and gylmao. Is there evidence of outliers or high leverage observations in the models? Ifso, please inspect the outliers. # Rcode: M), inuencePlotO, gylmaO (run lines 3845 on the Rfrle, modify the codes for the model 2 and 3) [Insert Routput] [Write your answer in the table below] Assumptions Residual plot Normal QQ plat [Write additional comments here] 6. Discuss about the strength and the weakness of the models in Question 1. And, briey discuss the way(s) to overcome the weakness of the models. [Write your answer here] Question 2 The dataset ' eengarnb' in the 'faraway' package concerns a study of teenage gambling in Britain. This data contains 47 observations with 5 variables (please see the below for the data frame). Variable Descriptions sex 0=male, 1=fema1e status Socioeconomic status score based on parents' occupation income Income in pounds per week verbal Verbal score in words out of 12 correctly dened gamble (response) Expenditure on gambling in pounds per year _\\"""'*"""\"""""""""'*"""'-"""""'m 1. Create a pairwise correlation plot and correlation matrix. Are there relationships between the predictors? Explain. # Rcode: my (run line 58 on the Rle) and ggvr (run line 59 on the Rle) [Insert Routput] [Write your answer here 2. Fit 3 multiple regression model to predict 'gamble'. Write an estimated regression model using the coefcients on the summary output and interpret the signicant coefcients. # Rcode: m) (run line 65 on the Rle) and summary() (run line 66 on the Rle) [Insert Routput] [Write your answer 1% 3. Plot the side-byside boxplot of 'gamble' for different gender groups. Does that suggest that male and female students have different gambling behavior? # Rcode: meg) (run line 70 on the Rle) [Insert Routput] [Write your answer 1% 4. Considering your answer of part 3), update the model in part 2 with interaction term (sex*income). Using the model summary, side-by-side box plot (in part 3), and interaction graph, test the signicance of interaction effect between the two signicant predictors, seq; and income in the form of Y = 30 + 31X1 + zxz + 83X1X2 + E- # Rcode: M) (run line 73 on the Rle), and summary() (run line 74 on the Rle) [Insert Routput] INTERACTION (INCOME*SEX) Gable Low income High boom: Male Female [Write your answer here] Question 3 Question 3 Wage data in the 'ISLR' package includes the 3000 observations on the following 1] variables. Variable Descriptions year Year that wage information was recorded age Age of worker maritl A factor with levels l=Never married, 2=Mam'ed, 3=Widowed, 4=Divorces, and 5=Separated indicating marital status _ 3 _ race A factor with levels l=White, 2=Black, 3=Asian, and JFOther indicating race education A factor with levels 1==Very Good indicating health level of worker Health ins A factor with levels 1=Yes and 2=No indicating whether worker has health incurance lo a e Log of workers wage wage Workers raw wage Summarize the Wage dataset using ), str(), and summary(). # Rcode: ), std), and summaryi) (run line 85-88) [Insert Routput] E 5252 Accessibility: Investigate r 1 LB; Focus # Rcode: E), strO, and summary() (run line 85-88) [Insert Routput] [Write your answer here] Create a scatterplot using age and wage. Describe the scatterplot in terms of form, direction, strength, and outlier(s). # Rcode: plot!) (run line 92 on the Rle) [Insert Routput] [Write your answer here] Fit a linear regression model that explains the wage according to the age. Does age signicantly explain the wage? Evaluate the model using summary statistics. # Rcode: lggQ (run line 95-96 on the Rle) [Insert R-oumut [ [Write your answer here Check the assumptions. Which assumptions are violated? Justify your answer. # Rcode: run lines 99-103 [Insert Rougput | [Write your answer here Considering your answer on part 4), please discuss about the two mgt_ggrgi_ijl_gnly_yi_g_la_te_c_l_ assumptions of the regression model; which assumptions are frequently violated? How can we relax those assumptions? [Write your answer here 6. Fit polynomial models with different degrees of power. Using the summary tables, plots (for assumptions), global test, and ANOVA table, choose the best model to explain the wage. Model 1: Wage = Bo + Bjage ... From part 3. Model 2: Wage = Bo + Blage + Bzage? Model 3: Wage = Po + Bage + Bzage2 + Baage 3 Model 4: Wage = Po + Blage + Bzage2 + Byage3 + BAage + Model 5: Wage = Bo + Bage + Bzage2 + Byage3 + BAage4 + Bsage5 # R-code: run lines 105-126 [Insert R-output] [Summarize the model fitting ] [Check assumptions ] Models Assumptions Model 1 Model 2 Model 3 Model 4 Model 5 Linearity Normality Independent Constant variance Additional comments: [Interpret the results on ANOVA test ] [Write your answer here: What is the best model? Why?]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Practicing Statistics Guided Investigations For The Second Course

Authors: Shonda Kuiper, Jeff Sklar

1st Edition

321586018, 978-0321586018

Students also viewed these Mathematics questions

Question

What are the APPROACHES TO HRM?

Answered: 1 week ago

Question

What do you mean by dual mode operation?

Answered: 1 week ago

Question

Explain the difference between `==` and `===` in JavaScript.

Answered: 1 week ago