Question

1 Approved Answer

Posted on Oct 14, 2024

M3 Group Assignment Instructions Complete the Assignment, name it as GroupXX_Assign3.xls (where XX is your Group Name), and upload and submit to the instructor through

M3 Group Assignment Instructions Complete the Assignment, name it as GroupXX_Assign3.xls (where XX is your Group Name), and upload and submit to the instructor through Dropbox. Do not enter anything in the spreadsheet cells that are black, labeled \"Grader\". You must complete this assignment without the assistance of persons other than the members of your Group. You may use any other resources you deem necessary. Answer the questions below by placing the appropriate graph and/or answers in the designated cells of the spreadsheet. DO NOT CHANGE THE APPEARANCE OR FUNCTIONALITY OF THE SPREADSHEET UNLESS INSTRUCTED TO DO SO. Question 1 (20 points) A mental health agency measured the self-esteem score for randomly selected individuals with disabilities who were involved in some work activity within the past year. The spreadsheet named Self Esteem provides the data including each individuals self-esteem measure (y), years of education (YrsEdu), age, months worked in the last 6 years (MonWork), marital status dummy variables (MS2, MS3, MS4) indicating if the individual is single, married, separated, or divorced, and a support level (SL) dummy variable indicating if the level of job support (counseling, etc) was provided directly (1) or indirectly (0). Regarding marital status, if single all MS indicators are 0, while MS2 = 1 indicates married, MS3 = 1 indicates separated, and MS4 = 1 indicates divorced. a. 3 points: In cell N4, use Excel's \"Correlation\" Data Analysis tool to construct a correlation matrix for all the variables. Note that the categories in columns I and J should not be included since the data are already represented as dummy variables in columns E through H. b. 3 points: Considering the correlation between self-esteem and each x variable identify the three variables that, based on correlation with y alone, should be considered as best candidates for inclusion in the model. Shade the appropriate cells containing the correlation values in yellow. Ignore any multicollinearity concerns for this part. c. 3 points: With cell N19 as the upper left hand corner of the output, fit the full regression model. (Do not include a residual plot) d. 4 points: Considering the regression output from part c, shade (in yellow) the name of any x variable that appears significant and should remain in the model. Also shade the t stat and p-value. Consider the p-value small if it is less than 0.05. e. 3 points: Partial Regression Model: With cell N51 the upper left hand corner of the output, fit the model including only the x variable(s) that were found to be significant in part d. (Do not include a residual plot) f. 4 points: Comment on the impact of each independent variable in the partial regression model from part e on Self Esteem. Provide your comments in Cells N66:T78. 1 Question 2 (15 points) A bank must prepare for a discrimination suit filed on behalf of female employees that claim females are paid less than male employees. The bank manager sampled employee files to see if he could build a useful model for predicting salary as a function of gender and other characteristics. For each employee, the data includes salary (y, in thousands of dollars), years experience (YrsExp), years prior experience (YrsPrior), and Gender. The data is in the spreadsheet named Bank. a. 3 points: Since Gender is a categorical variable, construct the appropriate dummy variable in column E to indicate gender as female = 1 and male = 0. You must use an \"IF\" statement in the appropriate cell(s) to indicate the correct dummy value based on gender. b. 4 points: With cell H7 the upper left hand corner of the output, fit the full model. (Do not include a residual plot). c. 3 points: Based on the regression output from part b, shade (in yellow) the name of any x variable that appears significant and should remain in the model. Also shade the t stat and p-value. d. 5 points: Based on your analysis, what is your assessment of the lawsuit? Does it look promising for the bank? What additional variables need to be included in the model to strengthen the validity of the model? Place your comments in Cells H33:P45. Question 3 (15 points) The trend in home building in recent years has been to emphasize open spaces and great rooms, rather than smaller living rooms and family rooms. A home builder has been building such homes, but his homes had been taking many months to sell and selling for substantially less than the asking price. In order to determine what types of homes would attract residents; the builder contacted your team. The spreadsheet named Builder contains the sales price y, square footage X1, number of rooms X2, number of bedrooms X3, and age X4 for each of 63 single-family residences recently sold. Perform a regression analysis of the data. a. b. 5 points: With cell I5 as the upper left hand corner of the output, fit the full regression model. Identify the significant coefficients. 10 points: Make recommendations to the builder in Cells I30:Q43 on how to increase sales price by adjusting house layout. For example, will it be more profitable to build a house with smaller living room given a fixed square footage? How about smaller family room or a larger great room? Will it be wise to increase the number of bedrooms? Question 4 (20 points) A company sells products in several sales territories, each of which is assigned to a single sales rep. The spreadsheet named Sales contains 25 observations on eight independent variables. A regression analysis needs to be conducted to determine whether a variety of predictor variables could explain sales in each territory. To compute all possible regression models, we could develop 8 one-variable equations, 28 two variable regression equations, and so on. As a matter of fact, a total of 255 different estimated regression equations can be modeled. Next try three advanced regression techniques in StatTools. Enter 0.05 in the p-value to Enter or Leave box. Variable Sales Time Poten AdvExp Share Change Accounts Definition Total sales credited to the sales rep Length of time employed in month Market potential; total industry sales in units for the sales territory Advertising expenditure in the sales territory Market share; weighted average for the past four years Change in the market share over the previous four years Number of accounts assigned to the sale rep 2 Work Rating a. b. c. d. Workload; a weighted index based on annual purchases and concentrations of accounts Sales rep overall rating on eight performance dimension; an aggregate rating on a 1-7 scale 5 points: With Cell K4 as the upper left hand corner of the output, fit the full regression model (report Regression Table only) using the Stepwise Regression function in StatTools. 5 points: With Cell K15 as the upper left hand corner of the output, fit the full regression model (report Regression Table only) using the Forward Selection function in StatTools. 5 points: With Cell K26 as the upper left hand corner of the output, fit the full regression model (report Regression Table only) using the Backward Elimination function in StatTools. 5 points: Did the three different methods reach the same model? If not, apply your managerial judgment to choose one model that you think is the best. State your reasons. Place your answers in Cells K37:T48. Question 5 (30 points) To measure value, Consumer Reports developed a statistic referred as a value score. The value score is based upon five-year owner costs, overall road-test scores, and predicted-reliability ratings. Five-year owner costs are based upon the expenses incurred in the first five years of ownership, including depreciation, fuel, maintenance and repairs, and so on. Using a national average of 12,000 miles per year, an average cost per mile driven is used as the measure of five-year owner costs. Road-test scores are the results of more than 50 tests and evaluations and are based on a 100-point scale, with higher score indicating better performance, comfort, convenience, and fuel economy. The highest road-test score obtained in the tests conducted by Consumer Reports was a 99 for a Lexus LS 460L. Predicted-reliability ratings (1=Poor, 2=Fair, 3=Good, 4=Very Good, and 5=Excellent) are based upon data from Consumer Reports' Annual Auto Survey. A car with a value score of 1.0 is considered to be an \"average-value\" car. A car with a value score of 2.0 is considered to be twice as good a value as a car with a value score of 1.0; a car with a value score of 0.5 is considered half as good as average; and so on. The data for three sizes of cars (13 small sedans, 20 family sedans, and 21 upscale sedans), including price ($) of each car tested, are provided in spreadsheet Car (Consumer Reports, April, 2012). a. 5 points: To incorporate the effect of size of a car, create two dummy variables in Columns H and I using the following coding. Family-Sedan: 1 if the car is a family sedan and 0 otherwise. Upscale-Sedan: 1 if the car is an upscale sedan and 0 otherwise. b. 7.5 points: First treating Cost/Mile as the dependent variable, develop an estimated regression with Family-Sedan and Upscale-Sedan as the independent variables. With Cell K3 as the upper left hand corner of the output, fit the full regression model (report Regression Table only). Summarize your model in Cells K12:S20. Are the coefficients significant? Interpret the coefficients. c. 7.5 points: Next treating Value Score as the dependent variable, develop an estimated regression equation using all other variables as the independent variables. Choose Stepwise as the Regression Type. With Cell K26 as the upper left hand corner of the output, fit the full regression model (report Regression Table only). Does your model support the claim that \"smaller car provide better values than larger cars?\" The Small Sedans represent the smallest type of car and the Upscale Sedans represent the largest type of car. State your reasons in Cells K39:S47. d. 5 points: Using the model developed in part c, calculated the predicted Value Score in Column J. Which car has the smallest Value Score residual and which car has the largest Value Score residual? e. 5 points: Place the Residual Plot for the model developed in part c in Cells K62:S78. Are the assumptions being satisfied? Is the variance of the error terms constant? Are residuals scattered randomly around zero? Place your assessments in Cells K80:S88. e. in in the model. Also shade the t stat and p-value. 3