The data set named BadCancr.dat (at www.uvm.edu/~dhowell/methods7/DataFiles/ BadCancr.dat) has been deliberately corrupted by entering errors into a
Question:
The data set named BadCancr.dat (at www.uvm.edu/~dhowell/methods7/DataFiles/
BadCancr.dat) has been deliberately corrupted by entering errors into a perfectly good data set (named Cancer.dat). The purpose of this corruption was to give you experience in detecting and correcting the kinds of errors that appear almost every time we attempt to use a newly entered data set. Every error in here is one that I and almost everyone I know have come across countless times. Some of them are so extreme that most statistical packages will not run until they are corrected. Others are logical errors that will allow the program to run, producing meaningless results. (No college student is likely to be 10 years old or receive a score of 15 on a 10-point quiz.) The variables in this set are described in the Appendix: Computer Data Sets for the file Cancer.dat. That description tells where each variable should be found and the range of its legitimate values. You can use any statistical package available to read the data. Standard error messages will identify some of the problems, visual inspection will identify others, and computing descriptive statistics or plotting the data will help identify the rest. In some cases, the appropriate correction will be obvious. In other cases, you will just have to delete the offending values. When you have cleaned the data, use your program to compute a final set of descriptive statistics on each of the variables.
This problem will take a fair amount of time. I have found that it is best to have students work in pairs.
Step by Step Answer: