Question

1 Approved Answer

Posted on Jul 04, 2024

D _ Question 1 In Question 1, we use a data set, Rent_Data.txtb) Using 'confint()' function, obtain 95% confidence intervals of the coefficient of each

D _ Question 1 In Question 1, we use a data set, \"Rent_Data.txt\b) Using 'confint()' function, obtain 95% confidence intervals of the coefficient of each model. # R-code: confint() (run line 31 on the R-file, modify the line 31 for model 2 and 3) [Insert R-output] [Write your answer in the table below] Coefficient Confidence interval Test Hypothesis Predictors (Lower CI, Upper CI) Is '0' inside of CI Decision Distance from.Airport - 1 - Distance.to. . .Downtown Distance to.University 4. When we build the regression model to predict outcome variable, it is very important to check the assumptions. Without verifying that the model has satisfied the assumptions, the results from the model may be misleading. Discuss about the assumptions we need to consider after fitting the mode to verify the model (Which assumptions we need to check, how to check the assumptions, when the assumptions are violated, and how can we resolve the violations). [Write your answer here] 5. Check the assumptions of the models using plot() and gylma(). Is there evidence of outliers or high leverage observations in the models? If so, please inspect the outliers. # R-code: plot(), influencePlot(), gvlma() (run lines 38-45 on the R-file, modify the codes for the model 2 and 3) [Insert R-5. Check the assumptions of the models using ) and gylmao. Is there evidence of outliers or high leverage observations in the models? Ifso, please inspect the outliers. # Rcode: M), inuencePlotO, gylmaO (run lines 3845 on the Rfrle, modify the codes for the model 2 and 3) [Insert Routput] [Write your answer in the table below] Assumptions Residual plot Normal QQ plat [Write additional comments here] 6. Discuss about the strength and the weakness of the models in Question 1. And, briey discuss the way(s) to overcome the weakness of the models. [Write your answer here] Question 2 The dataset ' eengarnb' in the 'faraway' package concerns a study of teenage gambling in Britain. This data contains 47 observations with 5 variables (please see the below for the data frame). Variable Descriptions sex 0=male, 1=fema1e status Socioeconomic status score based on parents' occupation income Income in pounds per week verbal Verbal score in words out of 12 correctly dened gamble (response) Expenditure on gambling in pounds per year _\\"""'*"""\"""""""""'*"""'-"""""'m 1. Create a pairwise correlation plot and correlation matrix. Are there relationships between the predictors? Explain. # Rcode: my (run line 58 on the Rle) and ggvr (run line 59 on the Rle) [Insert Routput] [Write your answer here 2. Fit 3 multiple regression model to predict 'gamble'. Write an estimated regression model using the coefcients on the summary output and interpret the signicant coefcients. # Rcode: m) (run line 65 on the Rle) and summary() (run line 66 on the Rle) [Insert Routput] [Write your answer 1% 3. Plot the side-byside boxplot of 'gamble' for different gender groups. Does that suggest that male and female students have different gambling behavior? # Rcode: meg) (run line 70 on the Rle) [Insert Routput] [Write your answer 1% 4. Considering your answer of part 3), update the model in part 2 with interaction term (sex*income). Using the model summary, side-by-side box plot (in part 3), and interaction graph, test the signicance of interaction effect between the two signicant predictors, seq; and income in the form of Y = 30 + 31X1 + zxz + 83X1X2 + E- # Rcode: M) (run line 73 on the Rle), and summary() (run line 74 on the Rle) [Insert Routput] INTERACTION (INCOME*SEX) Gable Low income High boom: Male Female [Write your answer here] Question 3 Question 3 Wage data in the 'ISLR' package includes the 3000 observations on the following 1] variables. Variable Descriptions year Year that wage information was recorded age Age of worker maritl A factor with levels l=Never married, 2=Mam'ed, 3=Widowed, 4=Divorces, and 5=Separated indicating marital status _ 3 _ race A factor with levels l=White, 2=Black, 3=Asian, and JFOther indicating race education A factor with levels 1==Very Good indicating health level of worker Health ins A factor with levels 1=Yes and 2=No indicating whether worker has health incurance lo a e Log of workers wage wage Workers raw wage Summarize the Wage dataset using ), str(), and summary(). # Rcode: ), std), and summaryi) (run line 85-88) [Insert Routput] E 5252 Accessibility: Investigate r 1 LB; Focus # Rcode: E), strO, and summary() (run line 85-88) [Insert Routput] [Write your answer here] Create a scatterplot using age and wage. Describe the scatterplot in terms of form, direction, strength, and outlier(s). # Rcode: plot!) (run line 92 on the Rle) [Insert Routput] [Write your answer here] Fit a linear regression model that explains the wage according to the age. Does age signicantly explain the wage? Evaluate the model using summary statistics. # Rcode: lggQ (run line 95-96 on the Rle) [Insert R-oumut [ [Write your answer here Check the assumptions. Which assumptions are violated? Justify your answer. # Rcode: run lines 99-103 [Insert Rougput | [Write your answer here Considering your answer on part 4), please discuss about the two mgt_ggrgi_ijl_gnly_yi_g_la_te_c_l_ assumptions of the regression model; which assumptions are frequently violated? How can we relax those assumptions? [Write your answer here 6. Fit polynomial models with different degrees of power. Using the summary tables, plots (for assumptions), global test, and ANOVA table, choose the best model to explain the wage. Model 1: Wage = Bo + Bjage ... From part 3. Model 2: Wage = Bo + Blage + Bzage? Model 3: Wage = Po + Bage + Bzage2 + Baage 3 Model 4: Wage = Po + Blage + Bzage2 + Byage3 + BAage + Model 5: Wage = Bo + Bage + Bzage2 + Byage3 + BAage4 + Bsage5 # R-code: run lines 105-126 [Insert R-output] [Summarize the model fitting ] [Check assumptions ] Models Assumptions Model 1 Model 2 Model 3 Model 4 Model 5 Linearity Normality Independent Constant variance Additional comments: [Interpret the results on ANOVA test ] [Write your answer here: What is the best model? Why?]