One way to see whether this procedure will be successful is to split the original data...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation and one subset for validation. A regression equation is estimated from the first subset. Then the values of explanatory variables from the second subset are substituted into this equation to obtain predicted values for the dependent variable. Finally, these predicted values are compared to the known values of the dependent variable in the second subset. If the agreement is good, there is reason to believe that the regression equation will predict well for new data. This procedure is called validating the fit. This validation procedure is fairly simple to perform in Excel. We illustrate it for the Bendrix manufacturing data in Example 10.2. (See the file Overhead Costs Validation.xlsx.) FUNDAMENTAL INSIGHT Training and Validation Sets This practice of partitioning a data set into a set for esti- mation and a set for validation is becoming much more common as larger data sets become available. It allows you to see how a given procedure such as regression works on a data set where you know the Ys. If it works well, you have more confidence that it will work well on a new data set where you do not know the Ys. This partitioning is a routine part of data mining, the explo- ration of large data sets. In data mining, the first data set is usually called the training set, and the second data set is called the validation or testing set. There we used 36 monthly observations to regress Overhead on Machine Hours and Production Runs. For convenience, the regression output is repeated in Figure 10.46. In particular, it shows an R² value of 86.6% and an s value of $4109. Now suppose that this data set is from one of Bendrix's two plants. The company would like to predict overhead costs for the other plant by using data on machine hours and production runs at the other plant. The first step is to see how well the regression from Figure 10.46 fits data from the other plant. This validation on the 36 months of data is shown in Figure 10.47. To obtain the results in this figure, proceed as follows. Procedure for Validating Regression Results 1. Copy old results. Copy the results from the original regression to the ranges A5:C5 and B9:B10. 2. Calculate fitted values and residuals. The fitted values are now the predicted values of overhead for the other plant, based on the original regression equation. Find these by substituting the new values of Machine Hours and Production Runs into the original equation. Specifically, enter the formula =A55-SUMPRODUCTSB:5:05BE3:013) Figure 10.46 Stat Tools Regression Output for Bendrix Example A B C 8 Multiple Regression for Overhead Summary Multiple R-Square R 10 0.9308 0.8664 D Adjusted R-square 0.8583 F G Std. Err. of Estimate 4108.99309 Rows Ignored 0 11 12 13 ANOVA Table Degrees of Freedom 14 Explained 15 Unexplained 2 33 Sum of Squares 3614020661 557166199.1 Mean of Squares 1807010330 16883824.22 107.0261279 p-Value <0.0001 16 17 Coefficient 18 Regression Table 19 Constant 20 Machine Hours 21 Production Runs 3996.678209 43.53639812 883.6179252 Standard Error 6603.650932 3.5894837 82.25140753 t-Value p-Value 0.605222512 12.12887472 10.74289124 0.5492 <0.0001 <0.0001 Confidence Interval 95% Lower -9438.550632 36.23353862 716.2761784 Upper 17431.90705 50.83925761 1050.959672 10-7 Validation of the Fit 471 Cengage Learning 1. In terms of the estimated data listed on the printout, explained what the R-square does. 2. What does the Constant under the Coefficient (3996.68) tell the analyst? 3. The relationship in the printout measures production costs based on machine hours and production runs. Focusing only on machine hours, provide an interpretation of the machine hours coefficient (43.54). 4. What does the Standard Error of machine hours tell us about this estimate? 5. What do the t-Value and the p-Value tell us about the machine hour coefficient? 6. Can we use the F Statistic to evaluate this model? Why or why not? 7. Should we use this model? Why or why not? 8. Using the machine hours estimate, composed a regression equation of the results on the printout. One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation and one subset for validation. A regression equation is estimated from the first subset. Then the values of explanatory variables from the second subset are substituted into this equation to obtain predicted values for the dependent variable. Finally, these predicted values are compared to the known values of the dependent variable in the second subset. If the agreement is good, there is reason to believe that the regression equation will predict well for new data. This procedure is called validating the fit. This validation procedure is fairly simple to perform in Excel. We illustrate it for the Bendrix manufacturing data in Example 10.2. (See the file Overhead Costs Validation.xlsx.) FUNDAMENTAL INSIGHT Training and Validation Sets This practice of partitioning a data set into a set for esti- mation and a set for validation is becoming much more common as larger data sets become available. It allows you to see how a given procedure such as regression works on a data set where you know the Ys. If it works well, you have more confidence that it will work well on a new data set where you do not know the Ys. This partitioning is a routine part of data mining, the explo- ration of large data sets. In data mining, the first data set is usually called the training set, and the second data set is called the validation or testing set. There we used 36 monthly observations to regress Overhead on Machine Hours and Production Runs. For convenience, the regression output is repeated in Figure 10.46. In particular, it shows an R² value of 86.6% and an s value of $4109. Now suppose that this data set is from one of Bendrix's two plants. The company would like to predict overhead costs for the other plant by using data on machine hours and production runs at the other plant. The first step is to see how well the regression from Figure 10.46 fits data from the other plant. This validation on the 36 months of data is shown in Figure 10.47. To obtain the results in this figure, proceed as follows. Procedure for Validating Regression Results 1. Copy old results. Copy the results from the original regression to the ranges A5:C5 and B9:B10. 2. Calculate fitted values and residuals. The fitted values are now the predicted values of overhead for the other plant, based on the original regression equation. Find these by substituting the new values of Machine Hours and Production Runs into the original equation. Specifically, enter the formula =A55-SUMPRODUCTSB:5:05BE3:013) Figure 10.46 Stat Tools Regression Output for Bendrix Example A B C 8 Multiple Regression for Overhead Summary Multiple R-Square R 10 0.9308 0.8664 D Adjusted R-square 0.8583 F G Std. Err. of Estimate 4108.99309 Rows Ignored 0 11 12 13 ANOVA Table Degrees of Freedom 14 Explained 15 Unexplained 2 33 Sum of Squares 3614020661 557166199.1 Mean of Squares 1807010330 16883824.22 107.0261279 p-Value <0.0001 16 17 Coefficient 18 Regression Table 19 Constant 20 Machine Hours 21 Production Runs 3996.678209 43.53639812 883.6179252 Standard Error 6603.650932 3.5894837 82.25140753 t-Value p-Value 0.605222512 12.12887472 10.74289124 0.5492 <0.0001 <0.0001 Confidence Interval 95% Lower -9438.550632 36.23353862 716.2761784 Upper 17431.90705 50.83925761 1050.959672 10-7 Validation of the Fit 471 Cengage Learning 1. In terms of the estimated data listed on the printout, explained what the R-square does. 2. What does the Constant under the Coefficient (3996.68) tell the analyst? 3. The relationship in the printout measures production costs based on machine hours and production runs. Focusing only on machine hours, provide an interpretation of the machine hours coefficient (43.54). 4. What does the Standard Error of machine hours tell us about this estimate? 5. What do the t-Value and the p-Value tell us about the machine hour coefficient? 6. Can we use the F Statistic to evaluate this model? Why or why not? 7. Should we use this model? Why or why not? 8. Using the machine hours estimate, composed a regression equation of the results on the printout.
Expert Answer:
Answer rating: 100% (QA)
Lets address each question 1 The Rsquare or coefficient of determination measures the proportion of the variance in the dependent variable Overhead that is explained by the independent variables Machi... View the full answer
Related Book For
Introduction to Data Mining
ISBN: 978-0321321367
1st edition
Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar
Posted Date:
Students also viewed these mathematics questions
-
Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...
-
Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...
-
Paula makes the following acquisitions of ordinary shares in Indigent plc: She sells 500 shares on 28 March 2021. No shares are acquired within the next 30 days. (a) Show the s104 holding on 28 March...
-
(a) Let p(x, y) denote the open statement "x divides y," where the universe for each of the variables x, y comprises all integers. (In this context "divides" means "exactly divides" or "divides...
-
Does Xerox embody or defy the "leaking pipeline" phenomenon. Why?
-
Use the data in Exercise 19 in Section 13.1 for the following: a. Compute a point estimate for the mean number of calories in fast-food products that contain 15 grams of protein. b. Construct a 95%...
-
1. The producers of branded drugs are responding to the introduction of generic competitors by _______and_______. 2. Ninja Turtles versus Tai Chi Frogs. The demand for fantasy amphibians is linear,...
-
Rabi, Inc., is a large food-processing company. It processes 154,000 pounds of peanuts in the peanuts department at a cost of $228,900 to yield 15,000 pounds of product A, 61,000 pounds of product B,...
-
Other than sufficient appropriate evidence and a written assurance report, determine the other three (3) elements of assurance engagements (i.e., three-party relationship, criteria, and subject...
-
Audiophile Records purchases CDs at a cost of $ 1 2 each. Operating expenses of the business are 2 5 % of the cost and the international owner requires a profit of 1 5 % of cost . How much is the...
-
Hazelnut Corporation manufactures lawn ornaments. It currently has two product lines, the basic and the luxury. Hazelnut has a total of $169,122 in overhead. The company has identified the following...
-
What are the steps include in Evaluation of the effectiveness of an emergency plan?
-
The high-tech home of today features computer apps that control air conditioning, heating, and large and small appliances from your smartphone or tablet. For a typical 2,000-square-foot home, the...
-
The following accounts of Ants Company shows the following accounts and amounts as of December 31, 2030, Gas and Oil expenses Jan, capital Jan, drawing Land Long term investment Miscellaneous...
-
Forecast Duff has experienced the following demand for Duff Swill, Duff Original, and Duff 200 for the last 8 years. Develop an adjusted exponential smoothing model (a = 0.30, B = 0.20) and a linear...
-
A consumer magazine is evaluating five brands of trash compactors for their effectiveness in reducing the volume of typical household products that are discarded. In the experiment, each block...
-
The probability density of a point x with respect to a multivariate normal distribution having a mean μ and covariance matrix Σ is given by the equation Using the sample...
-
In the initial example of Chapter 2, the statistician says, "Yes, fields 2 and 3 are basically the same." Can you tell from the three lines of sample data that are shown why she says that?
-
How might you address the problem that a histogram depends on the number and location of the bins?
-
Who are the IASB and what is their role within the IFRS Foundation and standardsetting framework?
-
There are several advantages to incorporating your business, but can you list some of the commonly perceived disadvantages?
-
Define the three terms: asset; liability; and equity.
Study smarter with the SolutionInn App