Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The dataset underpinning the analysis here is that used in the lab sessions during lectures. It has been uploaded as a spreadsheet named 'German' together

image text in transcribed

The dataset underpinning the analysis here is that used in the lab sessions during lectures. It has been uploaded as a spreadsheet named 'German' together with the data dictionary 'German aata dictionary' describing each attribute. You will recall that the dataset consists of data for 1000 applicants along with a variable that says whether they were subsequently Good or Bad from a credit perspective. 1. Split the dataset into two subsets as follows: Subset 1: the applicants with Checking =1 or Checking =2 Subset 2: the applicants where Checking =3 or Checking =4 Clean the subsets if necessary. 2. For each subset, establish a training set and validation set. Explain: a. what principle you have used to decide on these; b. why both training and validation sets are needed; c. any issues encountered during the splitting exercise. 3. For each training set choose four variables which are suitable for building a scorecard. For each training set the variables must have (i) at least one continuous variable before binning; (ii) at least one categorical variable with more than two categories, so you can see whether categories can be combined. Explain the rationale behind your choice of variables (using supporting statistics eg chisquare). Should you be unable to choose variables satisfying the above criteria, explain the problem you have encountered and the solution you have chosen to compromise the variable selection. 4. Using the binary variables obtained from the coarse classification in the above exercise to build two scorecards for each training set, one using linear regression, the other using logistic regression. Note this means you should have four scorecards in total: (i) using linear regression for Checking =1 or 2 ; (ii) using logistic regression for Checking =1 or 2 ; (iii) using linear regression for Checking =3 or 4 ; (iv) using logistic regression for Checking =3 or 4 ; Note that the file you submit should include, in the Appendix, a table that gives the binary variables you used, together with the coefficients for those variables calculated in each regression. 5. Derive ROC curves for all scorecards using the validation set applicable to each, showing in detail how sensitivity and specificity have been calculated. Estimate the Gini coefficient and KS values for each. Explain and comment on your results. Each question depends on the one before

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Spatial Databases With Application To GIS

Authors: Philippe Rigaux, Michel Scholl, Agnès Voisard

1st Edition

1558605886, 978-1558605886

More Books

Students also viewed these Databases questions

Question

What is the purpose of a code of conduct?

Answered: 1 week ago