Consider the credit approval data set crx.data from UCl's credit approval website. The data set is concerned

Question:

Consider the credit approval data set crx.data from UCl's credit approval website. The data set is concerned with credit card applications. The last column in the data set indicates whether the application is approved (+) or not (-). With the view of preserving data privacy, all 15 explanatory variables were anonymized. Note that some explanatory variables are continuous and some are categorical.

(a) Load and prepare the data for analysis with sklearn. First, eliminate data rows with missing values. Next, encode categorical explanatory variables using a OneHotEncoder object from sklearn.preprocessing to create a model matrix \(\mathbf{x}\) with indicator variables for the categorical variables, as described in Section 5.3.5.

(b) The model matrix should contain 653 rows and 46 columns. The response variable should be a 0/1 variable (reject/approve). We will consider several classification algorithms and test their performance (using a zero-one loss) via ten-fold cross validation.


(i) Write a function which takes 3 parameters: \(\mathbf{X}, \boldsymbol{y}\) and a model, and returns the ten-fold cross-validation estimate of the expected generalization risk.
(ii) Consider the following sklearn classifiers: KNeighborsClassifier \((k=5)\), LogisticRegression, and MPLClassifier (multilayer perceptron). Use the function from (i) to identify the best performing classifier.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Data Science And Machine Learning Mathematical And Statistical Methods

ISBN: 9781118710852

1st Edition

Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev

Question Posted: