Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 27, 2024

Task 1 : Import the raw data ( CC _ Default.csv ) into your Jupyter notebook. 1 . 1 Check if the data is loaded

Task

1

Import the raw data

(

_

Default.csv

)

into your Jupyter notebook.

1.1

Check if the data is loaded correctly by printing a few observations. Check the total number of observations and variables.

1.2

Provide the descriptive statistics and manipulate data.

Check for missing values if any.

Plot the univariate distribution.

Convert the relevant variables such as payment variables

(

Pay

0 -

Pay

6

and customer related variables

)

to categorical variables as appropriate.

1.3

Find the variables that are correlated and the variables that might help in finding the defaulters next month using a few plots. The plots should provide insights on the following:

The independent variable that should help identify those who will default from the next month

s credit card payment

The relation between dependent and independent variables

The correlations among the variables, etc.

1.4

Provide your insights on the variables and the relationship among the variables based on your analysis in Task

1.3

in a markdown cell in your Jupyter notebook.

Task

2

Import

Train

.

csv

into your Jupyter notebook.

2.1

Check the total number of observations and print a few records. Please note that the variable conversion in the raw data, similar to Task

1.2

should be applied

Hint: Convert the relevant variables such as payment variables, Pay

0 -

Pay

6,

and customer related variables

(

demographic

)

to categorical variables as appropriate.

2.2

Fit a logistic regression after making the dataset balanced.

Hint: Use class weight parameter.

2.3

Remove the variable

(

)

that would cause multicollinearity. Explicitly state the variable

(

)

that you are dropping in a markdown cell in your Jupyter notebook.

Hint: To remove a variable, use the drop function.

Import

Test

.

csv

into your Jupyter notebook.

2.4

Test the model on the test dataset. Please note that the variable conversion in the raw data, similar to Task

1.2

should be applied.

Hint: Convert the relevant variables such as payment variables, Pay

0 -

Pay

6,

and customer related variables

(

demographic

)

to categorical variables as appropriate.

2.5

Plot the confusion matrix.

2.6

Provide your insights on accuracy, precision and F

1

Score in a markdown cell in your Jupyter notebook.

Task

3

3.1

Fit a random forest model on

csv

with a random state of

1, 500

epochs, a maximum depth of

3

and a maximum feature of

3 .

3.2

Evaluate the confusion matrix, F

1

scores and accuracy. Compare the random forest model with the logistic regression from Task

2 .

State your observations in a markdown cell in your Jupyter notebook.

Task

4

4.1

Fit a support vector machine

(

SVM

)

model on

Train

.

csv

with the following parameters:

gamma

= 0.025

= 3 .

4.2

Provide the confusion matrix, F

1

scores and accuracy in a markdown cell in your Jupyter notebook.

Task

5

5.1

Fit an ANN model

(

sequential

)

with

16

input neurons and add two hidden layers with

8

neurons each.

5.2

Use

relu

activation and adam optimiser. Use the normal kernel initialiser. Run it for

100

epochs on

train

.

csv

with a batch size of

15 .

5.3

Provide the confusion matrix, F

1

scores and accuracy on the test dataset in a markdown cell in your Jupyter notebook.

Task

6

: s

Explain which model you will use based on the evaluation metrics on the test dataset among all the models from Task

2

to Task

5 (

logistic regression, random forest, SVM and ANN

)

and explain why. Put your answer in a markdown cell in your Jupyter notebook.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essential SQLAlchemy Mapping Python To Databases

Authors: Myers, Jason Myers

2nd Edition

★★★★★

4.31 For a binomial probability distribution with P = 0.5 and n = 12, find the probability that the number of successes is equal to 7 and the probability that the number of successes is fewer than 6.

Answered: 1 week ago

Previous Question Next Question