Task 1 Import the raw data ( CC Default csv ) into your Jupyter notebook 1 1 Check if the data is loaded correctly by printing a few observations Check the total number of observations and variables 1 2 Provide the descriptive statistics and manipulate data a Check for missing values if any b Plot the univariate distribution c Convert the relevant variables such as payment variables ( Pay 0 Pay 6 and customer related variables ) to categorical variables as appropriate 1 3 Find the variables that are correlated and the variables that might help in finding the defaulters next month using a few plots The plots should provide insights on the following a The independent variable that should help identify those who will default from the next month s credit card payment b The relation between dependent and independent variables c The correlations among the variables, etc 1 4 Provide your insights into the variables and their relationship based on your analysis in Task 1 3 in a markdown cell in your Jupyter notebook Task 2 Import Train csv into your Jupyter notebook 2 1 Check the total number of observations and print a few records Please note that the variable conversion in the raw data, similar to Task 1 2 should be applied Hint Convert the relevant variables such as payment variables, Pay 0 Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate 2 2 Fit a logistic regression after making the dataset balanced Hint Use class weight parameter 2 3 Remove the variable ( s ) that would cause multicollinearity Explicitly state the variable ( s ) that you are dropping in a markdown cell in your Jupyter notebook Hint To remove a variable, use the drop function Import Test csv into your Jupyter notebook 2 4 Test the model on the test dataset Please note that the variable conversion in the raw data, similar to Task 1 2 should be applied Hint Convert the relevant variables such as payment variables, Pay 0 Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate 2 5 Plot the confusion matrix 2 6 Provide your insights on accuracy, precision and F 1 Score in a markdown cell in your Jupyter notebook Task 3 3 1 Fit a random forest model on Train csv with a random state of 1 , 5 0 0 epochs, a maximum depth of 3 and a maximum feature of 3 3 2 Evaluate the confusion matrix, F 1 scores and accuracy Compare the random forest model with the logistic regression from Task 2 State your observations in a markdown cell in your Jupyter notebook Task 4 4 1 Fit support vector machine ( SVM ) algorithms on Train csv with the following parameters gamma 0 0 2 5 C 3 4 2 Provide the confusion matrix, F 1 scores and accuracy in a markdown cell in your Jupyter notebook Task 5 5 1 Fit an ANN model ( sequential ) with 1 6 input neurons and add two hidden layers with 8 neurons each 5 2 Use relu activation and adam optimiser Use the normal kernel initialiser Run it for 1 0 0 epochs on train csv with a batch size of 1 5 5 3 Provide the confusion matrix, F 1 scores and accuracy on the test dataset in a markdown cell in your Jupyter notebook Task 6 Explain which model you will use based on the evaluation metrics on the test dataset among all the models from Task 2 to Task 5 ( logistic regression, random forest, SVM and ANN ) and explain why Put your answer in a markdown cell in your Jupyter notebook About the data The CC Default csv Train csv Test csv dataset contains a total of 2 5 variables which are the following ID A numerical value assigned to each credit card customer LIMIT BAL The remaining credit a customer can use LIMIT BAL ( credit limit used up amount ) SEX 1 male 2 female EDUCATION A customer s educational attainment 1 graduate school 2 university 3 high school 4 others 5 unknown 6 unknown MARRIAGE A customer s marital status 0 unknown 1 married 2 single 3 others AGE A customer s age in years PAY 0 Repayment status in September 2 0 0 5 0 or less Paid duly 1 or greater the number indicates the number of months the payment was delayed PAY 2 Repayment status in August 2 0 0 5 0 or less Paid duly 1 or greater PAY 3 Repayment status in July 2 0 0 5 0 or less Paid duly 1 or greater PAY 4 Repayment status in June 2 0 0 5 0 or less Paid duly 1 or greater PAY 5 Repayment status in May 2 0 0 5 0 or less Paid duly 1 or greater PAY 6 Repayment status in April 2 0 0 5 0 or less Paid duly 1 or greater BILL AMT 1 BILL AMT 2 BILL AMT 3 BILL AMT 4 BILL AMT 5 BILL AMT 6 PAY AMT 1 PAY AMT 2 PAY AMT 3 PAY AMT 4 PAY AMT 5 PAY AMT 6 default payment next month Shows customers who defaulted on their payments on the following month 1 yes 0 no

Question

Task 1   Import the raw data ( CC   Default csv ) into your Jupyter notebook  1   1 Check if the data is loaded correctly by printing a few observations  Check the total number of observations and variables  1   2 Provide the descriptive statistics and manipulate data  a   Check for missing values if any  b   Plot the univariate distribution  c   Convert the relevant variables such as payment variables ( Pay 0   Pay 6 and customer related variables ) to categorical variables as appropriate  1   3 Find the variables that are correlated and the variables that might help in finding the defaulters next month using a few plots  The plots should provide insights on the following  a   The independent variable that should help identify those who will default from the next month s credit card payment b   The relation between dependent and independent variables c   The correlations among the variables, etc  1   4 Provide your insights into the variables and their relationship based on your analysis in Task 1   3 in a markdown cell in your Jupyter notebook  Task 2   Import Train   csv into your Jupyter notebook  2   1 Check the total number of observations and print a few records  Please note that the variable conversion in the raw data, similar to Task 1   2 should be applied Hint  Convert the relevant variables such as payment variables, Pay 0   Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate  2   2 Fit a logistic regression after making the dataset balanced  Hint  Use class weight parameter  2   3 Remove the variable ( s ) that would cause multicollinearity  Explicitly state the variable ( s ) that you are dropping in a markdown cell in your Jupyter notebook  Hint  To remove a variable, use the drop function  Import Test   csv into your Jupyter notebook  2   4 Test the model on the test dataset  Please note that the variable conversion in the raw data, similar to Task 1   2 should be applied  Hint  Convert the relevant variables such as payment variables, Pay 0   Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate  2   5 Plot the confusion matrix  2   6 Provide your insights on accuracy, precision and F 1 Score in a markdown cell in your Jupyter notebook  Task 3   3   1 Fit a random forest model on Train   csv with a random state of 1 , 5 0 0 epochs, a maximum depth of 3 and a maximum feature of 3   3   2 Evaluate the confusion matrix, F 1 scores and accuracy  Compare the random forest model with the logistic regression from Task 2   State your observations in a markdown cell in your Jupyter notebook  Task 4   4   1 Fit support vector machine ( SVM ) algorithms on Train   csv with the following parameters  gamma   0   0 2 5   C   3   4   2 Provide the confusion matrix, F 1 scores and accuracy in a markdown cell in your Jupyter notebook  Task 5   5   1 Fit an ANN model ( sequential ) with 1 6 input neurons and add two hidden layers with 8 neurons each  5   2 Use relu activation and adam optimiser  Use the normal kernel initialiser  Run it for 1 0 0 epochs on train   csv with a batch size of 1 5   5   3 Provide the confusion matrix, F 1 scores and accuracy on the test dataset in a markdown cell in your Jupyter notebook  Task 6   Explain which model you will use based on the evaluation metrics on the test dataset among all the models from Task 2 to Task 5 ( logistic regression, random forest, SVM and ANN ) and explain why  Put your answer in a markdown cell in your Jupyter notebook  About the data The CC   Default csv  Train csv    Test csv   dataset contains a total of 2 5 variables which are the following  ID A numerical value assigned to each credit card customer LIMIT   BAL  The remaining credit a customer can use LIMIT   BAL   ( credit limit   used up amount ) SEX 1   male   2   female EDUCATION A customer s educational attainment  1   graduate school 2   university 3   high school 4   others 5   unknown 6   unknown MARRIAGE A customer s marital status  0   unknown 1   married 2   single 3   others AGE A customer s age in years PAY   0 Repayment status in September 2 0 0 5   0 or less  Paid duly 1 or greater   the number indicates the number of months the payment was delayed PAY   2 Repayment status in August 2 0 0 5 0 or less  Paid duly 1 or greater   PAY   3 Repayment status in July 2 0 0 5 0 or less  Paid duly 1 or greater   PAY   4 Repayment status in June 2 0 0 5 0 or less  Paid duly 1 or greater   PAY   5 Repayment status in May 2 0 0 5 0 or less  Paid duly 1 or greater   PAY   6 Repayment status in April 2 0 0 5 0 or less  Paid duly 1 or greater   BILL   AMT 1 BILL   AMT 2 BILL   AMT 3 BILL   AMT 4 BILL   AMT 5 BILL   AMT 6 PAY   AMT 1 PAY   AMT 2 PAY   AMT 3 PAY   AMT 4 PAY   AMT 5 PAY   AMT 6 default payment next month Shows customers who defaulted on their payments on the following month  1   yes 0   no

Accepted Answer

The Answer is in the image, click to view ...

Question

Task 1 : Import the raw data ( CC _ Default.csv ) into your Jupyter notebook. 1 . 1 Check if the data is loaded

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Essentials of Database Management

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question