Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Business Statistics, ISOM2500 (L1) Practice Final 1. In a simple linear regression, the least squares regression line is (a) the line which makes the sample

Business Statistics, ISOM2500 (L1) Practice Final 1. In a simple linear regression, the least squares regression line is (a) the line which makes the sample correlation as close to +1 or 1 as possible. (b) the line which best splits the data in half, with half of the data points lying above the regression line and half of the data points lying below the regression line. (c) the line which minimizes the sum of squared residuals. (d) the line which minimizes the number of points that do not pass through the line. 2. A least squares regression line is determined from a sample of values for variables x and y, where x is the size of a listed home (in square feet), and y is the selling price of the home. Which of the following statements is true concerning the tted line y = b0 + b1 x? (a) If there is a positive correlation r between x and y, then the slope b1 must also be positive (b) The units on the intercept b0 and the slope b1 will be the same as the units on the variable y (c) If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y (d) None of above is true 3. The residual plot below consists of 104 observations. Based on the plot one can conclude that RMSE is around (a) 0 (b) 25 (c) 40 (d) 80 4-5. An insurance agent has selected a sample of drivers that she insures whose ages are in the range from 16 to 42 years. For each driver, she records the age of the driver (x) and the dollar amount of claims (y) that the driver led in the previous 12 months. A scatterplot showing the dollar amount of claims as the response and the age as the predictor shows a linear trend. The least squares regression line is determined to be: y = 3715 75.4x. A plot of the residuals versus age of the drivers showed no pattern, and the following were reported: r2 = 0.822, standard deviation of the residuals Se = 312.1. 4. Which of the following is correct? (a) If the age of a driver increases from 20 to 21, the dollar amount of claims is predicted to decrease by $75.4 (b) If the age of a driver increases by one year, the dollar amount of claims is predicted to increase by $3715 (c) One can use the least squares regression line to obtain a reliable prediction of the dollar amount of claims for a driver whose age is 55 years (d) The dollar amount of claims for a driver of 10 years old is expected to be $2961. 5. Which of the following is false? (a) 82.2% of the variation in the dollar amounts of claims is explained by the age of the driver. (b) The correlation r between the response and the predictor is 0.907 (c) If the histogram of the residuals is symmetric around zero and bell-shaped, then about 68% of the dollar amounts of claims are within 312.1 dollars of the regression line. (d) A driver in the data set whose age is 25 years had a residual of $150 using the tted line above; this means his dollar amount of claims is $1680. 1 6-7. A LS linear regression is tted to a data set, and the residual plot is shown below. 6. Which of the following is correct? (a) A linear model is okay because the association between the two variables is fairly strong. (b) The linear model is not good because the correlation between the response and the predictor is near 0. (c) The linear model is not good because some residuals are large. (d) The linear model is not good because of the curve in the residuals. 7. If one uses the LS linear regression to make predictions, which of the following statements is true? (a) The predictions tend to be too high for large x's. (b) The predictions tend to be too high for intermediate x's. (c) The predictions tend to be too high for small x's. (d) None of the above is correct. 8. In a study of the association between the car mileage (miles per gallon, mpg) and the car weight, it is found that the association is curved. To make the association to be linear, one decides to change the response to be 100 multiple of the reciprocal of the mileage. The scatterplot of the new response vs the car weight (in thousands of pounds) is shown below. A LS linear regression is tted to the transformed variables, and yields the following equation Estimated new response = 0.95 + 1.25 Weight (000 lbs) Based on the equation, what's the predicted mileage (measured in mpg) for a car of weight 5,000 pounds? (a) 6251 (b) 0.016 (c) 7.2 2 (d) 13.89 9-10. Each worker at an assembly plant that produces clock radios is responsible for the entire assembly of each unit they work on. The plant manager has collected data from a sample of workers: the number of years (YRS) of experience at the plant, and the number of hours per unit (TIME) required for assembly. The scatterplot of TIME versus YRS is shown below. 9. Which of the following is an appropriate reason why a regression line should not be used to make predictions based on this data? (a) The magnitude of the slope of the line is too large (b) The intercept of the tted line has no practical interpretation in this context (c) The linear condition for simple regression does not appear to be met (d) The associate between TIME and YRS is negative 10. The manager has decided to transform the response variable from TIME (hours/unit) to 1/TIME (units/hour). The scatterplot of 1/TIME versus YRS is shown below. Which of the following is an appropriate interpretation of these results? (a) The unit on Se is hours per unit (b) More experienced workers are predicted to produce more units per hour on average than less experienced workers (c) Because the transformed model has a higher r2 , it is better. (d) The slope b1 measures the elasticity between 1/TIME and YRS 3 11-15. The scatterplot of sales in thousands of cartons (y) of half-gallon orange juice versus the price (x) is given below. We apply log transformation on both y and x to t the nonlinear pattern. Assume the transformed x and y agree with SRM. Transformed Fit Log to Log Log(Sales) = 4.811646 - 1.7523832*Log(Price) Summary of Fit RSquare Root Mean Square Error Mean of Response Parameter Estimates Term Estimate Intercept 4.811646 Log(Price) -1.752383 0.755335 0.385788 3.136468 Std Error 0.148033 0.143954 t Ratio 32.50 -12.17 Prob>|t| <.0001* <.0001* 11. Which of the following interpretations of the tted equation is true? (a) As the price increase by 1%, the sales decrease by 1.75% on average (b) As the price increase by $1, the sales decrease by 1.75 units on average. (c) As the price increase by 1%, the sales decrease by 1.75 units (d) As the price increase by $1, the sales decrease by 1.75% on average. 12. Based on the tted equation, what's the predicted sales (in thousands of cartons) for a price of $2.3? (a) 3.35 (b) 28.56 (c) 4.18 (d) 65.22 13. Suppose the cost of a half-gallon juice is $1.5, then the optimal price is about (a) $1.9 (b) $3.0 (c) $3.5 (d) $4.1 14. The statistics of the slope show that (a) The elasticity is positive with at least 95% condence (b) The elasticity is bigger than 1 with at least 95% condence (c) The elasticity is smaller than 1 with at least 95% condence (d) None of the above is correct. 15. About the estimated intercept 4.811646, which of the following is the appropriate interpretation? (a) It estimates the sales in thousands of cartons when the price equals $0. (b) It estimates the sales in thousands of cartons when the price equals $1. (c) It estimates the logarithm of the sales in thousands of cartons when the price equals $1. (d) None of the above is correct. 4 16. The normal quantile plot of residuals from a regression equation in the plot below suggests that (a) The tted equation is linear. (b) The R-squared statistic is about 0.9 or more. (c) The model errors are normally distributed. (d) The data in the sample are dependent. 17-19. A LS linear regression is tted to the 2011 daily returns on HSBC (HSBC Rtn) vs those on Hang Seng index (HS Rtn). The following are some plots and summaries one gets in the tting procedure. 5 17. Based on the plots above, which of the following assumptions about the SRM seems to be violated? (a) Linear association (c) Equal variance of errors (b) Normality of errors (d) Independence of errors 18. If the return on Hang Seng index increases by 1%, at 95% condence level, which of the following statements about the return on HSBC is true? (a) It will increase by at least 0.93%. (b) It will increase by less than 1.08%, on average. (c) It will be at least 0.93%, on average. (d) It will increase by 1.002947%. 19. Which of the following statements is false? (a) We do not reject the hypothesis that 0 = 0. (b) We do not reject the hypothesis that returns on HSBC move on average by the same amount with returns on the Hang Seng index. (c) We do not reject the hypothesis that 1 = 0. (d) Returns on HSBC are correlated with the returns on the market 20-24. A large national bank charges local companies for using their services. A bank ocial reported the results of a regression analysis designed to predict the bank's charges (Y ) - measured in dollars per month - for services rendered to local companies. One explanatory variable used to predict service charge to a company is the company's sales revenue (X) - measured in millions of dollars. Data for 21 companies who use the bank's services were used to t the model. The results of the simple linear regression are provided below. Assume the conditions of the SRM are satised. y = 2, 700 + 20x, RMSE = 65, 6 p-value for testing 1 = 0 is 0.034. 20. Interpret the estimate of 0 , the intercept of the line. (a) All companies will be charged at least $2,700 by the bank. (b) There is no practical interpretation since a sales revenue of $0 is a nonsensical value. (c) About 95% of the observed service charges fall within $2,700 of the least squares line. (d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700. 21. Interpret the estimate of , the standard deviation of the error term in the model. (a) About 95% of the observed service charges fall within $65 of the least squares line. (b) About 95% of the observed service charges equal their corresponding predicted values. (c) About 95% of the observed service charges fall within $130 of the least squares line. (d) For every $1 million increase in sales revenue, we expect a service charge to increase $65. 22. Interpret the p-value for testing the hypothesis that 1 = 0. (a) There is sucient evidence (at = 0.05) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y ). (b) There is insucient evidence (at = 0.05) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y ). (c) Sales revenue (X) is a poor predictor of service charge (Y ). (d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034. 23. A 95% condence interval for 1 is [15, 30]. Interpret the interval. (a) We are 95% condent that the mean service charge will fall between $15 and $30 per month. (b) We are 95% condent that the sales revenue (X) will increase between $15 and $30 million for every $1 increase in service charge (Y ). (c) We are 95% condent that on average the service charge (Y ) will increase between $15 and $30 for every $1 million increase in sales revenue (X). (d) At the = 0.05 level, there is not enough evidence of a linear relationship between service charge (Y ) and sales revenue (X). 24. To obtain a narrower condence interval for the estimated slope in this model, we should advise the bank ocial to (a) concentrate on companies which spent less on using the bank's services. (b) concentrate on companies which spent more on using the bank's services. (c) concentrate on companies whose sales revenues are either relatively low or relatively high. (d) obtain additional data for companies of widely varying sales revenues. 25-26. It is believed that, the average numbers of hours spent studying per day (HOURS) during undergraduate education should have a positive linear relationship with the starting salary (SALARY, measured in thousands of dollars per month) after graduation. Given below is the output from regressing SALARY on HOURS for a sample of 51 students. R Square Standard Error Observations Intercept Hours 0.7845 1.3704 51 Coefficients -1.8940 0.9795 Standard Error 0.4018 0.0733 t Stat -4.7134 13.3561 P-value 2.051E-05 5.944E-18 25. What's the value of the t-test statistic to test whether HOURS is a useful linear predictor of SALARY? (a) 4.7134 (b) 1.8940 (c) 0.9795 7 (d) 13.3561 26. The 90% condence interval for the average change in SALARY (in thousands of dollars) associated with one extra hour of studying per day is (a) wider than [-2.70, -1.09] (c) wider than [0.83, 1.13] (b) narrower than [-2.70, -1.09] (d) narrower than [0.83, 1.13] 27-31. A construction contractor is involved in a wide variety of construction projects. The operations manager wants to investigate how the Total Hours of labor (design, engineering, modeling, simulation, construction, software support, etc.) required for a project is related to the Total Cost of completing the project. Based on data collected over many projects, the data was used to determine a predicting equation for the simple regression model: Total Cost = F + M Total Hours + , where F and M are the xed and marginal costs respectively. After determining the predicting equation, a scatterplot of residuals vs. Total Hours was determined as given below: 27. Which of the following statements is an appropriate interpretation of these results? (a) The similar variances condition for simple regression does not appear to be satised by the data (b) Prediction intervals for small values of the Total Hours would tend to be too narrow (c) Condence intervals for the slope of the line should still be considered reliable (d) None of the above 28-31. In an attempt to improve the model, the manager decides to use 1/Total Hours as the explanatory variable, and Cost/Hour ($/Hour) as the response. The model becomes: Cost 1 =M +F + Hour Hours The regression output and the scatterplot of the data are given below: 8 Summary of Fit RSquare Root Mean Square Error 0.12 27.2 28. Which of the following statements is correct? (a) The total cost of a project is predicted to decrease as the number of hours required increases. (b) The total cost of a project is expected to increase by $118.41 per additional hour of labor required for the project. (c) The xed cost of a project is predicted to be approximately $118.41. (d) None of the above. 29. Using the revised model, what is the average cost per hour for a project that will require 300 total hours of labor to complete? (a) 466, 401 (b) 113.2 (c) 0.0088 (d) 118.4 30. Using the revised model, what is the approximate 95% prediction interval for the total cost of a project that will require 600 total hours of labor to complete? (a) (61, 170) (b) (53,200, 85,800) (c) (36,900, 102,100) (d) (69,400, 69,500) 31. The information given in the parameter estimate table about the intercept implies that (a) Fixed costs are signicantly dierent from zero. (b) Marginal costs are signicantly dierent from zero. (c) Marginal costs decrease as the number of square feet increase (d) Marginal costs cannot not be estimated from the model. 9 32-33. A simple regression model is tted to a data set, with the scatterplot and the least squares line shown below. It is clear that the observation represented by a solid circle in the upper-right corner is an outlier. 32. If the outlier is removed, how will the intercept and the slope of the least squares line change? (a) The intercept will be smaller, and the slope will be smaller too. (b) The intercept will be smaller, and the slope will be bigger. (c) The intercept will be bigger, and the slope will be bigger too. (d) The intercept will be bigger, and the slope will be smaller. 33. If the outlier is removed, how will the standard deviation of the residuals change? (a) increase (b) decrease (c) stay the same (d) cannot tell 34-35. Weekly commodity prices for heating oils (in cents) were obtained and regressed against time. The residual plot is shown below. 34. Which assumptions of SRM appears to be violated? (a) Linear association (c) Equal variance of errors (b) Normality of errors (d) Independence of errors 35. If one uses the obtained regression equation to make prediction about the commodity prices for heating oils in the next week, then compared with the actual price, the prediction is likely to be . (a) higher (b) lower (c) on target (d) cannot tell based on the information given. 10

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Probability and Random Processes With Applications to Signal Processing and Communications

Authors: Scott Miller, Donald Childers

2nd edition

123869811, 978-0121726515, 121726517, 978-0130200716, 978-0123869814

More Books

Students also viewed these Mathematics questions

Question

help asp

Answered: 1 week ago