Pathfinder College is a small liberal arts college that wants to improve its admissions process. In particular,

Question:

Pathfinder College is a small liberal arts college that wants to improve its admissions process. In particular, too many of its incoming freshmen have failed to graduate for a variety of reasons, including dropping out, or transferring to another college, or failing to satisfy the graduation standards. Therefore, the admissions office wants to have a better method of predicting whether an applicant would succeed in graduating. The college primarily uses two predictor variables for evaluating an applicant, namely, the high school grade point average and the SAT score. If either is sufficiently high (either a GPA ≥ 3.30 or an SAT score ≥ 1,200), the applicant is immediately accepted. Such applicants who accept admission usually do well until successfully graduating. However, the admissions office needs to go deeper into its applicant pool to fill up its freshman class. The question is how to predict which of these other applicants are the better bet for successfully graduating. The admissions office wishes to consider an applicant for admission only if the prediction is that the applicant is more likely than not of successfully graduating. (Qualitative information about the applicant also will be considered before making the final admission decision, but the focus of this problem is on how to use the GPA and SAT scores.) The admissions office has randomly selected 800 students that were previously admitted and enrolled but did not meet the criteria for immediate acceptance. All have now had sufficient time to graduate. The data for these historical records, including their graduation status, are provided in the spreadsheet file titled Pathfinder College Data available in www.mhhe.com/Hillier7e. The original data (on the Original Data worksheet tab) needs to be cleaned. Perform the following data cleaning tasks. 

a. Search for missing data. Identify the students that have missing data by specifying the student number and the data that are missing. 

b. Search for mis-entered data by sorting each column to look for outliers. List any suspicious data, indicating the student number, what is suspicious, and if possible, conjecture on the likely true value. 

c. Some analysis requires numerical rather than text data. Add another column to the dataset labeled “Graduated 0/1”, and transform the Yes & No data in the “Graduated” column to 0’s and 1’s in the new column with a 1 representing Yes and a 0 representing No.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Question Posted: