Question

1 Approved Answer

Posted on Oct 13, 2024

Applied Regression Analysis: Best Subsets Regression Best Subsets Regression: y versus x1, x2, x3, x4, x5 Response is y Vars 1 1 2 2 3

Applied Regression Analysis: Best Subsets Regression Best Subsets Regression: y versus x1, x2, x3, x4, x5 Response is y Vars 1 1 2 2 3 3 4 4 5 R-Sq 79.7 16.7 80.6 79.8 80.7 80.7 80.7 80.7 80.8 R-Sq(adj) 79.0 13.8 79.2 78.3 78.5 78.5 77.8 77.7 76.9 Mallows C-p -0.6 81.2 0.2 1.3 2.1 2.1 4.0 4.1 6.0 S 0.21270 0.43062 0.21142 0.21590 0.21488 0.21508 0.21879 0.21889 0.22293 x x x x x 1 2 3 4 5 X X X X X X X X X X X X X X X X X X X X X X X X X Applied Regression Analysis Please go to documents for "Appendix: Best Subsets Regression" You will need this appendix to work some of the questions on this exam. QUESTION 1 A multiple regression model can incorporate polynomial terms derived from regressor variables with each term treated as a new regressor. True False QUESTION 2 In regression, to model a qualitative (categorical) variable an allocated code such as 1, 2, and 3 is appropriate. True False QUESTION 3 Forward selection, backward elimination and stepwise regression all lead to the same choice of final model. True False QUESTION 4 Ridge regression is a regression method that is most useful in situations where there is multicollinearity among regressors. True False QUESTION 5 An intrinsically linear model can be transformed to an equivalent linear form. True False QUESTION 6 Adjusted does not necessarily increase as additional regressors are introduced into the model. True False QUESTION 7 The ridge regression estimator is a linear transformation of the least-squares estimator. True False QUESTION 8 The breakdown point of ordinary least-squares estimators for a sample of size n is 1/n. True False QUESTION 9 One way of choosing the proper value for k in ridge regression is to use the ridge trace. True False QUESTION 10 If the p-term regression model has negligible bias, the expected value of Mallows' Cp statistic is p. True False QUESTION 11 A leverage point is a data point that has an unusual response has a zero residual is remote in x-space is not normally distributed QUESTION 12 A point with a large Cook's distance is usually near the center of the data in x-space has influence on the estimated regression coefficients has a zero residual all of the above QUESTION 13 In the forward selection procedure, a simpler model is usually created by increasing F-to-enter decreasing F-to-enter increasing -to-enter none of the above QUESTION 14 Consider a model for output viscosity, y, based on reaction temperature x1reaction pressure x2concentration x3 Which one is the full quadratic model that includes interaction terms y = B0 + B1x1 + B2x2 + B3x3 + B4x1x2 + B5x1x3 + B6x2x3 + + + y = B0 + B1x1 + B2x2 + B3x3 + B4x1x2 + B5x1x3 + B6x2x3 y = B0 + B1x1 + B2x2 + B3x3 + + + None of the above QUESTION 15 Which of the following are true about Mallows' Cp statistic? Generally large values of Cp are desirable Cp is used as a criterion for model selection The result is not sensitive to the estimate 2 All of the above QUESTION 16 If a 95% confidence interval of an estimator of a regression coefficient contains 0, which of the following statements is true. The true value of is 0 The estimated value of is 0 The null hypothesis = 0 cannot be rejected at = 0.05 None of the above QUESTION 17 For the following spline model, what are the knots? 1.8 4 2 and 4 None of the above QUESTION 18 To make the following spline model continuous, what action should be taken? No action needed QUESTION 19 The eigenvalues of X'X are 120, 60, and 3. The condition number of X'X is 120 60 40 3 QUESTION 20 If a categorical regressor has four levels, how many indicator variables need to be created? 1 2 3 4 QUESTION 21 Consider a model for output viscosity, y, based on reaction temperature x1 and supplier x2, where temperature is a continuous variable and supplier is a categorical variable with two levels. Assume there will be differences in the intercepts and slopes of the regression lines for each supplier separately. What will be a suitable regression model using an indicator variable for x2? y = B0 + B1x1 + B2x2 + B3x1x2 y = B0 + B1x1 + B2x2 y = B0 + B1x1 None of the above QUESTION 22 The output in the Appendix shows the results of Best Subsets Regression of y on regressors x1, x2, x3, x4 and x5. Among the following models, which is the best choice? x1, x3 and x5 x1, x2, x3 and x4 x1, x3 and x4 x1, x2, x3, x4 and x5 QUESTION 23 Weighted least squares regression is a modification of ordinary least squares that adjusts the estimation of coefficients for the sample size for the ridge trace for nonzero mean for nonconstant variance QUESTION 24 In robust regression, Huber's t function with t = 2 is used as the criterion, for a point with z = 3, what is the value of the influence function? 0 1 2 3 QUESTION 25 Which of the following can be used to validate a regression model? Analysis of the model coefficients and predicted values including comparisons with prior experience, physical theory, and other analytical models or simulation results. Collection of new data to investigate the model's predictive performance. Data splitting, that is, setting aside some of the original data and using these observations to investigate the model's predictive performance. All of the above Applied Regression Analysis Quiz: Minitab Output Results for: GASOLINE_MILEAGE_DATA_WITH_INDICATOR_VARIABLES.MTW y: miles per gallon x1: Displacement (cubic inches) x2: Horsepower (ft-lb) x3: Torque (ft-lb) Etc. Regression Analysis: y versus x1, x2, x3, x4, x5, x6, x7, x8, x9, x10 The regression equation is y = 19.4 - 0.0761 x1 - 0.0743 x2 + 0.121 x3 + 1.32 x4 + 5.98 x5 + 0.29 x6 - 3.40 x7 + 0.185 x8 - 0.409 x9 - 0.00518 x10 30 cases used, 2 cases contain missing values Predictor Constant x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 Coef 19.35 -0.07613 -0.07425 0.12098 1.317 5.976 0.289 -3.398 0.1853 -0.4094 -0.005184 S = 3.14565 SE Coef 28.99 0.05623 0.08663 0.08907 3.021 3.076 1.251 2.859 0.1259 0.3124 0.005742 R-Sq = 83.5% T 0.67 -1.35 -0.86 1.36 0.44 1.94 0.23 -1.19 1.47 -1.31 -0.90 P 0.512 0.192 0.402 0.190 0.668 0.067 0.820 0.249 0.158 0.206 0.378 R-Sq(adj) = 74.8% Analysis of Variance Source Regression Residual Error Total Source x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 DF 1 1 1 1 1 1 1 1 1 1 DF 10 19 29 SS 951.098 188.007 1139.105 MS 95.110 9.895 F 9.61 P 0.000 Seq SS 866.227 5.436 4.441 14.489 2.876 9.074 6.644 0.347 33.502 8.063 Unusual Observations Obs 22 31 x1 360 360 y 21.470 13.770 Fit 16.170 19.049 SE Fit 2.212 2.080 Residual 5.300 -5.279 St Resid 2.37R -2.24R R denotes an observation with a large standardized residual