Answered step by step
Verified Expert Solution
Question
1 Approved Answer
This example demonstrates a common problem when historical data are used. Sup- pose that X is the number of workers employed on a production shift
This example demonstrates a common problem when historical data are used. Sup- pose that X is the number of workers employed on a production shift and Y is the number of units produced on that shift. Most of the time the factory operates with a relatively sta- ble workforce, and output depends in large part on the amount of raw materials available and the sales requirements. The operation adjusts up or down over a narrow range in re- sponse to demands and to the available workforce, X. Thus, we see that in most cases the scatter plot covers a narrow range for the X variable. But occasionally there is a very large or small workforce or the number of workers is recorded incorrectly. On those days the production might be unusually high or low-or might be recorded incorrectly. As a result, we have extreme points that can have a major influence on the regression model. These few days determine the slope of the regression equations. Without the extreme points the regression would indicate little or no relationship. If these extreme points represent exten- sions of the relationship, then the estimated model is useful. But if these points result from unusual conditions or recording errors, the estimated model is misleading. In a particular application we may find that these extreme points are correct and should be used to determine the regression line. But the analyst needs to make that decision know- ing that all the other data points do not support a significant relationship. In fact, you do need to understand the system and process that generated the data to evaluate the available data. Outlier points are defined as those that deviate substantially in the y direction from the predicted value. Typically, these points are identified by computing the standardized residual as follows: (11.31) SV - That is, the standardized residual-Equation 11.31-is the residual divided by the standard error of the residual. Note that in the previous equation, points with high leverage_large h; -will have a smaller standard error of the residual. This occurs because points with high leverage are likely to influence the location of the estimated regression line, and, hence, the observed and expected values of Y will be closer. Minitab will mark observations that have 11.9 Graphical Analysis 441 an absolute value of the standardized residual greater than 2.0 with an R to indicate that they are outliers. This capability is also available in most good statistical packages, but not in Ex- cel. Using this capability, outlier points can be identified, as shown in Example 11.7. k36 Example 11.7 The Effect of Outliers in the Y Variable (Scatter Plot Analysis) In this example we consider the effect of outliers in the y, or vertical, direction. Recall that the regression analysis model assumes that all the variation is in the Y direction. Thus, we know that outliers in the y direction will have large residuals, and these will result in a higher estimate of the model error. In this example we see that the effects can be even more extreme
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started