Answered step by step
Verified Expert Solution
Question
1 Approved Answer
STAT 501 - Homework 1 - Fall 2017 - Due Aug 27 Instructions: Use Word to type your answers within this document. Then, save and
STAT 501 - Homework 1 - Fall 2017 - Due Aug 27 Instructions: Use Word to type your answers within this document. Then, save and upload your document to Canvas by the due date. The point distribution is located next to each question. Unauthorized distribution and/or uploading of this document is strictly prohibited. 1. (6x2 = 12 points) State which of the following statements are TRUE and which are FALSE. For the statements that are false, explain why they are false. (a) 0 = b0 because they are two different symbols for the same thing. False. 0 is the population intercept, while b0, the sample intercept, is an estimates 0 . (b) 1 is an estimate of the unknown value of b1. False , the opposite is true ;We use the sample slope b 1 estimate the population slope 1. (c) For a given x value, depends on all sample data. False. It only depends on the given X value as is the predicted response value. (d) E(y) is unknown and is estimated by . False. It is known. (the predicted response variable) estimates the observed response yi. (e) For a given sample of n observations, the n random errors ( i , i=1,...,n) sum to zero. True. (f) When x=x , the corresponding value of y on the simple linear regression line is y= y . True. 2. (10 points) The scatterplot below shows sample data for y = selling price of a house and x = square foot area of the house. Discuss important features of the plot in the context of simple linear regression. Consider the \"average\" pattern and features of variation in sale prices at specific home sizes (square feet area). Your answer should be about 3 or 4 sentences. There is a positive relationship between the square footage of a home and its selling price; the average pattern is that the price increases linearly, as the area increases, but not perfectly. The larger is the house, the larger is the variation from the line. The square footage \"predicts\" the price response. It is a positive correlation, as the slope is positive (as predictor increases, response increases, or as X value increases, Y value increases), yielding a positive R (correlation coefficient). The R measures the strength and direction of a linear relationship between two variables on a scatter plot. 3. (10x4 = 40 points) Infant weights in pounds have an upward linear trend with age in months. Data from a sample of 5 babies in a local community, including one newborn and four others who are 1 month, 2 months, 3 months and 5 months old, were used to obtain an estimated regression equation based on the least squares criterion with a slope of 0.2635 pounds per month. Some of the information is given in the following table. Age (xi) Weight (yi) Predicted weight ( ^y i ) Residual 0 1 8.3 2 8.5 3 9.1 8.9108 0.0797 0.1892 5 error (ei) DO NOT USE MINITAB TO ANSWER THIS QUESTION. Rather, use properties and formulas for simple linear regression (you may use Minitab to check your answers). (a) What is the equation of the population regression line in this setting? [Hint: There should be no numbers in this equation, just 's.] E(Y)=0+1x. (b) What is the estimated regression equation? [Hint: There should be numbers in this equation. Use the information in the question and in the table, particularly the 3-month old baby.] =0+1xi. 8.9108= 0+0.2635x3 0= 8.1203 Answer: Yi= 8.1203+0.2635Xi+ i . (c) Based on the estimated regression equation, what is the predicted birth weight of a newborn in this community? i=8.1203+0.2635(3)+ 0.1892 = 9.1 ? (d) What is the actual (observed) birth weight of the newborn in the sample? (e) Complete the remaining entries in the table above. (f) Comment on the validity of using the estimated regression equation to predict the weight for a one-year-old. (g) Calculate SSE, the sum of residual error squares. (h) Calculate the sample estimate of the variance, 2 , for the regression model. (i) Calculate the value that would be given in Minitab for \"S=". Write a sentence that interprets this value. (j) Calculate the value of R2. To start, you will have to calculate the value of SSTO. Write a sentence that interprets the value of R2. 4. (2x2 = 4 points) Match the simple linear regression coefficients to their correct interpretations: (a) Intercept, b0. Predicted y when x = 0, if meaningful. (b) Slope, b1. Expected change in y for a 1-unit increase in x. Select one interpretation for each coefficient from the following list: Expected change in x for a 1-unit increase in y. Expected change in y for a 1-unit increase in x. Predicted x when y = 0, if meaningful. Predicted y when x = 0, if meaningful. 5. (3+2+2+3 = 10 points) Suppose a simple linear regression model fit to a sample of size 10 resulted in a sum of squared errors of 597.4 and total sum of squares of 5799.6. SSE=ni=1(yiy^i)2= (ei)2= 597.4 SSTO=ni=1(yiy)2= 5799.6 (a) Calculate the mean square error. MSE=SSE/(n-2)= 597.4/(10-2) = 74.675 (b) Calculate the square root of your answer to part (a). MSE= 74.675 = 8.6414 (c) What is the correct interpretation of the answer to part (b)? [Select one] (i) The estimated standard deviation of the response variable. (ii) The estimated variance of the response variable. (iii) The estimated standard deviation of the residual errors. (iv) The estimated variance of the residual errors. (d) Calculate and interpret the coefficient of determination, r2. r2=1SSE/SSTO= 1- 597.4/5799.6= 0.897 Approximately 89.7% of the variation in y is accounted for by the variation in predictor x. 6. (4x2 = 8 points) Match each situation described to the most appropriate caution about the coefficient of determination, r2, for simple linear regression (SLR): (a) An SLR model fir to data with a curved relationship has r2 = 88%. A large r2 value does not imply that the estimated regression line fits the data well. (b) The r2 for an SLR model changes from 25% to 8% when a data point is removed. One data point can greatly affect the r2 value. (c) An SLR model for variables y and x has an r2 of 70%, but in reality y and x have little to do with one another. Association does not imply causation. (d) The r2 for an SLR model fit to a large dataset is statistically significant but the estimated slope is not meaningfully different from 0. Statistical significance does not imply practical significance. Select one caution for each situation from the following list: Association does not imply causation. One data point can greatly affect the r2 value. A large r2 value does not imply that the estimated regression line fits the data well. Statistical significance does not imply practical significance. 7. (4x4 = 16 points) The fitted line plot below gives results for a straight-line regression between y = lung cancer mortality index (100 = average) and x = smoking index (100 = average) from n = 25 occupational groups. The mortality index is the ratio of the rate of deaths from lung cancer among men in the particular occupational group to the rate of deaths from lung cancer among all men. The smoking index is the ratio of the average number of cigarettes smoked per day by men in the particular occupational group to the average number of cigarettes smoked per day by all men. (a) Write a sentence gives the value of slope that the and interprets it in the context of this situation. The slope is 1.088, making the trend in the plot positive- as X increases, Y tends to increase. It represents an increase, on average, of 1.088 number of mortalities per additional cigarette smoked. (b) Describe briefly about the value of R2 and interpret it in the context of this situation. The "coefficient of determination" or "r-squared value\" is the regression sum of squares divided by the total sum of squares. r2 also equals one minus the ratio of the error sum of squares to the total sum of squares. The predictor x (cigarettes smoked per day), in this case, accounts for 51.3% of the variation in y (mortality). (c) The Minitab output above also includes the information that \"S = 18.6154.\" What does this statistic measure? The S= 18.6154 measures the standard deviation of the residuals- differences between actual and predicted mortality rate. It is a measure of how accurate a regression estimates are. (d) Use the fitted regression equation to predict the lung cancer mortality index for an occupational group with smoking index 10% higher than average. =- 2.89+ 1.088 (10)= 7.99
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started