Money lending has been around since the advent of money; it is perhaps the worlds second-oldest profession.

Question:

Money lending has been around since the advent of money; it is perhaps the world’s second-oldest profession. The systematic evaluation of credit risk, though, is a relatively recent arrival, and lending was largely based on reputation and very incomplete data. Thomas Jefferson, the third President of the United States, was in debt throughout his life and unreliable in his debt payments, yet people continued to lend him money. It wasn’t until the beginning of the 20th century that the Retail Credit Company was founded to share information about credit. That company is now Equifax, one of the big three credit scoring agencies (the other two are Transunion and Experion).
Individual and local human judgmentare now largely irrelevant to the credit reporting process. Creditagencies and other big financial institutions extending credit at the retail level collect huge amounts of data to predict whether defaults or other adverse events will occur, based on numerous customer and transaction information.
Data This case deals with an early stage of the historical transition to predictive modeling, in which humans were employed to label records as either good or poor credit. The German Credit dataset 2 has 30 attributes and 1000 records, each record being a prior applicant for credit. Each applicant was rated as “good credit” (700 cases) or “bad credit” (300 cases). Table 23.2 shows the values of these attributes for the first four records. All the attributes are explained in Table 23.3. New applicants for credit can also be evaluated on these 30 predictor attributes and classified as a good or a bad credit risk based on the predictor values.
The consequences of misclassification have been assessed as follows: the costs of a false positive (incorrectly saying that an applicant is a good credit risk) outweigh the benefits of a true positive (correctly saying that an applicant is a good credit risk) by a factor of 5. This is summarized in Table 23.4. The opportunity cost table was derived from the average net profit per loan as shown in Table 23.5. Because decision makers are used to thinking of their decision in terms of net profits, we use these tables in assessing the performance of the various models.

1. Review the predictor attributes, and guess what their role in a credit decision might be. Are there any surprises in the data?
2. Divide the data into training and holdout partitions,and develop classification models using the following machine learning techniques: logistic regression, classification trees, and neural networks. (Note: With the Remap Binominals operator, consider mapping the binominal target attribute RESPONSE such that the positive values are mapped to “good”
and the negative values are mapped to “bad,” allowing the results to be meaningfully interpreted.)
3. Choose onemodel fromeach technique, and report the confusion matrix and the cost/gain matrix for the holdout data. Which technique has the highest net profit?
4. Let us try and improve our performance. Rather than accepting the default classification of allapplicants’ credit status, use the estimated probabilities (propensities) from the logistic regression (where success means good)as a basis for selecting the best credit risks first, followed by poorrisk applicants. Create an attribute containing the net profit for each record in the holdout set. Use this attribute to create a lift chart (cumulative gains chart) for the holdout set that incorporates the net profit. (Hint: First, generate the PROFIT attribute with the expression: if([RESPONSE] == “good”, 100, −500). Then, sort the records in descending order of confidence(good). Use the Integrate operator from the Series Extension to compute the cumulative profit values that can be used to generate the lift chart.)

a. How far into the holdout data should you go to get maximum net profit? (Often, this is specified as a percentile or rounded to deciles.)

b. If this logistic regression model is used to score to future applicants, what “probability of success” threshold should be used in extending credit?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Machine Learning For Business Analytics

ISBN: 9781119828792

1st Edition

Authors: Galit Shmueli, Peter C. Bruce, Amit V. Deokar, Nitin R. Patel

Question Posted: