Task 1 Import the raw data ( CC Default csv ) into your Jupyter notebook 1 1 Check if the data is loaded correctly by printing a few observations Check the total number of observations and variables 1 2 Provide the descriptive statistics and manipulate data a Check for missing values if any b Plot the univariate distribution c Convert the relevant variables such as payment variables ( Pay 0 Pay 6 and customer related variables ) to categorical variables as appropriate 1 3 Find the variables that are correlated and the variables that might help in finding the defaulters next month using a few plots The plots should provide insights on the following a The independent variable that should help identify those who will default from the next month s credit card payment b The relation between dependent and independent variables c The correlations among the variables, etc 1 4 Provide your insights into the variables and their relationship based on your analysis in Task 1 3 in a markdown cell in your Jupyter notebook Task 2 Import Train csv into your Jupyter notebook 2 1 Check the total number of observations and print a few records Please note that the variable conversion in the raw data, similar to Task 1 2 should be applied Hint Convert the relevant variables such as payment variables, Pay 0 Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate 2 2 Fit a logistic regression after making the dataset balanced Hint Use class weight parameter 2 3 Remove the variable ( s ) that would cause multicollinearity Explicitly state the variable ( s ) that you are dropping in a markdown cell in your Jupyter notebook Hint To remove a variable, use the drop function Import Test csv into your Jupyter notebook 2 4 Test the model on the test dataset Please note that the variable conversion in the raw data, similar to Task 1 2 should be applied Hint Convert the relevant variables such as payment variables, Pay 0 Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate 2 5 Plot the confusion matrix 2 6 Provide your insights on accuracy, precision and F 1 Score in a markdown cell in your Jupyter notebook Task 3 3 1 Fit a random forest model on Train csv with a random state of 1 , 5 0 0 epochs, a maximum depth of 3 and a maximum feature of 3 3 2 Evaluate the confusion matrix, F 1 scores and accuracy Compare the random forest model with the logistic regression from Task 2 State your observations in a markdown cell in your Jupyter notebook Task 4 4 1 Fit support vector machine ( SVM ) algorithms on Train csv with the following parameters gamma 0 0 2 5 C 3 4 2 Provide the confusion matrix, F 1 scores and accuracy in a markdown cell in your Jupyter notebook Task 5 5 1 Fit an ANN model ( sequential ) with 1 6 input neurons and add two hidden layers with 8 neurons each 5 2 Use relu activation and adam optimiser Use the normal kernel initialiser Run it for 1 0 0 epochs on train csv with a batch size of 1 5 5 3 Provide the confusion matrix, F 1 scores and accuracy on the test dataset in a markdown cell in your Jupyter notebook Task 6 Explain which model you will use based on the evaluation metrics on the test dataset among all the models from Task 2 to Task 5 ( logistic regression, random forest, SVM and ANN ) and explain why Put your answer in a markdown cell in your Jupyter notebook

Question

Task 1   Import the raw data ( CC   Default csv ) into your Jupyter notebook  1   1 Check if the data is loaded correctly by printing a few observations  Check the total number of observations and variables  1   2 Provide the descriptive statistics and manipulate data  a   Check for missing values if any  b   Plot the univariate distribution  c   Convert the relevant variables such as payment variables ( Pay 0   Pay 6 and customer related variables ) to categorical variables as appropriate  1   3 Find the variables that are correlated and the variables that might help in finding the defaulters next month using a few plots  The plots should provide insights on the following  a   The independent variable that should help identify those who will default from the next month s credit card payment b   The relation between dependent and independent variables c   The correlations among the variables, etc  1   4 Provide your insights into the variables and their relationship based on your analysis in Task 1   3 in a markdown cell in your Jupyter notebook  Task 2   Import Train   csv into your Jupyter notebook  2   1 Check the total number of observations and print a few records  Please note that the variable conversion in the raw data, similar to Task 1   2 should be applied Hint  Convert the relevant variables such as payment variables, Pay 0   Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate  2   2 Fit a logistic regression after making the dataset balanced  Hint  Use class weight parameter  2   3 Remove the variable ( s ) that would cause multicollinearity  Explicitly state the variable ( s ) that you are dropping in a markdown cell in your Jupyter notebook  Hint  To remove a variable, use the drop function  Import Test   csv into your Jupyter notebook  2   4 Test the model on the test dataset  Please note that the variable conversion in the raw data, similar to Task 1   2 should be applied  Hint  Convert the relevant variables such as payment variables, Pay 0   Pay 6 , and customer related variables ( demographic ) to categorical variables as appropriate  2   5 Plot the confusion matrix  2   6 Provide your insights on accuracy, precision and F 1 Score in a markdown cell in your Jupyter notebook  Task 3   3   1 Fit a random forest model on Train   csv with a random state of 1 , 5 0 0 epochs, a maximum depth of 3 and a maximum feature of 3   3   2 Evaluate the confusion matrix, F 1 scores and accuracy  Compare the random forest model with the logistic regression from Task 2   State your observations in a markdown cell in your Jupyter notebook  Task 4   4   1 Fit support vector machine ( SVM ) algorithms on Train   csv with the following parameters  gamma   0   0 2 5   C   3   4   2 Provide the confusion matrix, F 1 scores and accuracy in a markdown cell in your Jupyter notebook  Task 5   5   1 Fit an ANN model ( sequential ) with 1 6 input neurons and add two hidden layers with 8 neurons each  5   2 Use relu activation and adam optimiser  Use the normal kernel initialiser  Run it for 1 0 0 epochs on train   csv with a batch size of 1 5   5   3 Provide the confusion matrix, F 1 scores and accuracy on the test dataset in a markdown cell in your Jupyter notebook  Task 6   Explain which model you will use based on the evaluation metrics on the test dataset among all the models from Task 2 to Task 5 ( logistic regression, random forest, SVM and ANN ) and explain why  Put your answer in a markdown cell in your Jupyter notebook

Accepted Answer

The Answer is in the image, click to view ...

Question

Task 1 : Import the raw data ( CC _ Default.csv ) into your Jupyter notebook. 1 . 1 Check if the data is loaded

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Learning MySQL Get A Handle On Your Data

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question