Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The dataset underpinning the analysis here is that used in the lab sessions during lectures. It has been uploaded as a spreadsheet named 'German' together
The dataset underpinning the analysis here is that used in the lab sessions during lectures. It has been uploaded as a spreadsheet named 'German' together with the data dictionary 'German aata dictionary' describing each attribute. You will recall that the dataset consists of data for 1000 applicants along with a variable that says whether they were subsequently Good or Bad from a credit perspective. 1. Split the dataset into two subsets as follows: Subset 1: the applicants with Checking =1 or Checking =2 Subset 2: the applicants where Checking =3 or Checking =4 Clean the subsets if necessary. 2. For each subset, establish a training set and validation set. Explain: a. what principle you have used to decide on these; b. why both training and validation sets are needed; c. any issues encountered during the splitting exercise. 3. For each training set choose four variables which are suitable for building a scorecard. For each training set the variables must have (i) at least one continuous variable before binning; (ii) at least one categorical variable with more than two categories, so you can see whether categories can be combined. Explain the rationale behind your choice of variables (using supporting statistics eg chisquare). Should you be unable to choose variables satisfying the above criteria, explain the problem you have encountered and the solution you have chosen to compromise the variable selection. 4. Using the binary variables obtained from the coarse classification in the above exercise to build two scorecards for each training set, one using linear regression, the other using logistic regression. Note this means you should have four scorecards in total: (i) using linear regression for Checking =1 or 2 ; (ii) using logistic regression for Checking =1 or 2 ; (iii) using linear regression for Checking =3 or 4 ; (iv) using logistic regression for Checking =3 or 4 ; Note that the file you submit should include, in the Appendix, a table that gives the binary variables you used, together with the coefficients for those variables calculated in each regression. 5. Derive ROC curves for all scorecards using the validation set applicable to each, showing in detail how sensitivity and specificity have been calculated. Estimate the Gini coefficient and KS values for each. Explain and comment on your results. Each question depends on the one before
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started