Question: One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation

One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation

One way to see whether this procedure will be successful is to split the original data set into two subsets: one subset for estimation and one subset for validation. A regression equation is estimated from the first subset. Then the values of explanatory variables from the second subset are substituted into this equation to obtain predicted values for the dependent variable. Finally, these predicted values are compared to the known values of the dependent variable in the second subset. If the agreement is good, there is reason to believe that the regression equation will predict well for new data. This procedure is called validating the fit. This validation procedure is fairly simple to perform in Excel. We illustrate it for the Bendrix manufacturing data in Example 10.2. (See the file Overhead Costs Validation.xlsx.) FUNDAMENTAL INSIGHT Training and Validation Sets This practice of partitioning a data set into a set for esti- mation and a set for validation is becoming much more common as larger data sets become available. It allows you to see how a given procedure such as regression works on a data set where you know the Ys. If it works well, you have more confidence that it will work well on a new data set where you do not know the Ys. This partitioning is a routine part of data mining, the explo- ration of large data sets. In data mining, the first data set is usually called the training set, and the second data set is called the validation or testing set. There we used 36 monthly observations to regress Overhead on Machine Hours and Production Runs. For convenience, the regression output is repeated in Figure 10.46. In particular, it shows an R value of 86.6% and an s value of $4109. Now suppose that this data set is from one of Bendrix's two plants. The company would like to predict overhead costs for the other plant by using data on machine hours and production runs at the other plant. The first step is to see how well the regression from Figure 10.46 fits data from the other plant. This validation on the 36 months of data is shown in Figure 10.47. To obtain the results in this figure, proceed as follows. Procedure for Validating Regression Results 1. Copy old results. Copy the results from the original regression to the ranges A5:C5 and B9:B10. 2. Calculate fitted values and residuals. The fitted values are now the predicted values of overhead for the other plant, based on the original regression equation. Find these by substituting the new values of Machine Hours and Production Runs into the original equation. Specifically, enter the formula =A55-SUMPRODUCTSB:5:05BE3:013) Figure 10.46 Stat Tools Regression Output for Bendrix Example A B C 8 Multiple Regression for Overhead Summary Multiple R-Square R 10 0.9308 0.8664 D Adjusted R-square 0.8583 F G Std. Err. of Estimate 4108.99309 Rows Ignored 0 11 12 13 ANOVA Table Degrees of Freedom 14 Explained 15 Unexplained 2 33 Sum of Squares 3614020661 557166199.1 Mean of Squares 1807010330 16883824.22 107.0261279 p-Value

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

Lets address each question 1 The Rsquare or coefficient of determination measures the proportion of the variance in the dependent variable Overhead that is explained by the independent variables Machi... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Dentre os indicadores relevantes em demografia e sade pblica, todas as opes abaixo esto corretas, exceto: Selecione uma opo de resposta: a. A maneira mais usada para calcular a fertilidade a Taxa...

Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Dynamic Energy Systems stock is currently trading for $33 per share. The stock pays no dividends. One0year European put option on Dynamic with a strike price of $35 is currently trading for $2.10. If...

HARVARD LAWREVIEW [Vol.62 THE CASE OF THE SPELUNCEAN EXPLORERSINTHE SUPREME COURT OF NEWGARTH,4300 T HE defendants, having been indicted for the crime of mur- der, were convicted and sentenced to be...

Requested to do an integrated summary of these paper by reflecting to the local (Malaysian) Corporate Financial Scenarios. Three views on stock markets and corporate behavior: Tobin, Veblen, and Marx...

5. Suppose that you are an analyst for the National Institutes of Health and your work team have been asked to investigate the effects of diet and exercise on health outcomes. You have a large cross...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

JPMA-01726; No of Pages 12 Available online at www.sciencedirect.com ScienceDirect International Journal of Project Management xx (2015) xxx - xxx www.elsevier.com/locate/ijproman Does Agile work? A...

The straight-line regression slope for the population is So that B1 =h (t1, t2, t3, t4, t5). (xi-TU)

Fresh water is to be extracted at a rate of 100 L/s from brackish water at 15oC with a salinity of 0.05% (by mass). Determine (a) The mole fraction of water in the brackish water, (b) The minimum...

Pro forma financial statements Blank _ _ _ _ _ _ . Multiple choice question. are future oriented always follow GAAP focus on what happened in the past are guaranteed to be accurate

List the structural and compliance requirements under Federal income tax law that must be met before a parent and its affiliates are allowed to file on a consolidated basis. Consider only the...

Determine whether the following transactions are taxable. If a transaction is not taxable, indicate what type of reorganization is effected, if any. a. AlphaPsi Corporation owns two lines of business...

Classify the following transactions as taking place in the primary or secondary markets: a. IBM issues $200 million of new common stock b. The New Company issues $50 million of common stock in an IPO...

The following ports of call are available for the Pacific Holiday Company: Create the PORT OF CALL element. Examine the data to determine the length and format of theelement. Apia Pago Pago Bora Bora...

Appletons owns and operates a variety of casual dining restaurants in three cities: St. Louis, Memphis, and New Orleans. Each geographic market is considered a separate division. The St. Louis...

Identify the type of power exhibited in the aforementioned situations. Justify. a . Meera, the officially appointed Project Manager, opened the discussion and assigned responsibilities b . Ritu, a...

The probability density of a point x with respect to a multivariate normal distribution having a mean Î¼ and covariance matrix Î£ is given by the equation Using the sample...

In the initial example of Chapter 2, the statistician says, "Yes, fields 2 and 3 are basically the same." Can you tell from the three lines of sample data that are shown why she says that?

How might you address the problem that a histogram depends on the number and location of the bins?

Suppose a manager is using maximum EMV as a basis for making a capacity decision and, in the process, obtains a result in which there is a virtual tie between two of the seven alternatives. How is...

Refer to Problem 1. Suppose after a certain amount of discussion, the contractor is able to subjectively assess the probabilities of low and high demand: P (low) = .3 and P (high) = .7. a. Determine...

Contrast maximax and maximin decision strategies. Under what circumstances is each appropriate?