Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Task 1 : Import the raw data ( CC _ Default.csv ) into your Jupyter notebook. 1 . 1 Check if the data is loaded
Task :
Import the raw data CCDefault.csv into your Jupyter notebook.
Check if the data is loaded correctly by printing a few observations. Check the total number of observations and variables.
Provide the descriptive statistics and manipulate data.
a Check for missing values if any.
b Plot the univariate distribution.
c Convert the relevant variables such as payment variables PayPay and customer related variables to categorical variables as appropriate.
Find the variables that are correlated and the variables that might help in finding the defaulters next month using a few plots. The plots should provide insights on the following:
a The independent variable that should help identify those who will default from the next months credit card payment
b The relation between dependent and independent variables
c The correlations among the variables, etc.
Provide your insights into the variables and their relationship based on your analysis in Task in a markdown cell in your Jupyter notebook.
Task :
Import Traincsv into your Jupyter notebook.
Check the total number of observations and print a few records. Please note that the variable conversion in the raw data, similar to Task should be applied
Hint: Convert the relevant variables such as payment variables, PayPay and customer related variables demographic to categorical variables as appropriate.
Fit a logistic regression after making the dataset balanced.
Hint: Use class weight parameter.
Remove the variables that would cause multicollinearity. Explicitly state the variables that you are dropping in a markdown cell in your Jupyter notebook.
Hint: To remove a variable, use the drop function.
Import Testcsv into your Jupyter notebook.
Test the model on the test dataset. Please note that the variable conversion in the raw data, similar to Task should be applied.
Hint: Convert the relevant variables such as payment variables, PayPay and customer related variables demographic to categorical variables as appropriate.
Plot the confusion matrix.
Provide your insights on accuracy, precision and F Score in a markdown cell in your Jupyter notebook.
Task :
Fit a random forest model on Traincsv with a random state of epochs, a maximum depth of and a maximum feature of
Evaluate the confusion matrix, F scores and accuracy. Compare the random forest model with the logistic regression from Task State your observations in a markdown cell in your Jupyter notebook.
Task :
Fit support vector machine SVM algorithms on Traincsv with the following parameters: gamma ; C
Provide the confusion matrix, F scores and accuracy in a markdown cell in your Jupyter notebook.
Task :
Fit an ANN model sequential with input neurons and add two hidden layers with neurons each.
Use relu activation and adam optimiser. Use the normal kernel initialiser. Run it for epochs on traincsv with a batch size of
Provide the confusion matrix, F scores and accuracy on the test dataset in a markdown cell in your Jupyter notebook.
Task :
Explain which model you will use based on the evaluation metrics on the test dataset among all the models from Task to Task logistic regression, random forest, SVM and ANN and explain why. Put your answer in a markdown cell in your Jupyter notebook.
About the data
The CCDefault.csv "Train.csv "Test.csv dataset contains a total of variables which are the following:
ID A numerical value assigned to each credit card customer
LIMITBAL: The remaining credit a customer can use
LIMITBAL credit limit used up amount
SEX male ; female
EDUCATION A customers educational attainment:
graduate school
university
high school
others
unknown
unknown
MARRIAGE A customers marital status:
unknown
married
single
others
AGE A customers age in years
PAY Repayment status in September :
or less: Paid duly
or greater the number indicates the number of months the payment was delayed
PAY
Repayment status in August
or less: Paid duly
or greater
PAY Repayment status in July
or less: Paid duly
or greater
PAY Repayment status in June
or less: Paid duly
or greater
PAY Repayment status in May
or less: Paid duly
or greater
PAY Repayment status in April
or less: Paid duly
or greater
BILLAMT
BILLAMT
BILLAMT
BILLAMT
BILLAMT
BILLAMT
PAYAMT
PAYAMT
PAYAMT
PAYAMT
PAYAMT
PAYAMT
default.payment.next.month
Shows customers who defaulted on their payments on the following month: yes no
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started