Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 01, 2024

Part II: An application (75 marks) 2.1 Background on Credit Card Dataset The data, CreditCard Data.xls, is based on Yeh and hui Lien (2009). The

Part II: An application (75 marks)

2.1 Background on Credit Card Dataset

The data, \CreditCard Data.xls", is based on Yeh and hui Lien (2009). The data

contains 30,000 observations and 23 explanatory variables. The response variable, Y, is a

binary variable where \1" refers to default payment and \0" implies non-default payment.

The description of 23 explanatory variables is as follows:

X1: Amount of the given credit (NT dollar): it includes both the individual con-

sumer credit and his/her family (supplementary) credit.

X2: Gender (1 = male; 2 = female).

X3: Education (0 = unknown; 1 = graduate school; 2 = university; 3 = high school;

4 = others; 5 = unknown; 6 = unknown).

X4: Marital status (0 = unknown; 1 = married; 2 = single; 3 = others).

X5: Age (year).

X6 - X11: History of past payment. The data was tracked the past monthly payment

records (from April to September, 2005) as follows: X6 = the repayment status in

September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the

repayment status in April, 2005. The measurement scale for the repayment status

is: -2= no consumption, -1=pay duly, 0 = the use of revolving credit; 1 = payment

delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay

for eight months; 9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement

in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 =

amount of bill statement in April, 2005.

X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in Septem-

ber, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April,

2005.

2.2Assessment Tasks

2.2.1 Data

(a) Select a random sample of 70% of the full dataset as the training data, retain the

rest as test data. Provide the code and print out the dimensions of the training

data. (5 marks)

2.2.2 Tree Based Algorithms

(a) Use an appropriate tree based algorithm to classify credible and non-credible clients.

Specify any underlying assumptions. Justify your model choice as well as hyper-

parameters which are required to be specied in R. (10

marks)

(b) Display model summary and discuss the relationship between the response variable

versus selected features. (10 marks)

the results. (5 marks)

2.2.3 Support vector classier

(a) Use an appropriate support vector classier to classify the credible and non-credible

clients. Justify your model choice as well as hyper-parameters which are required

to be specied in R. (10 marks)

(b) Display model summary and discuss the relationship between the response variable

versus selected features. (10 marks)

the results. (5 marks)

2.2.4 Prediction

Apply your tted models in 2.2.2 and 2.2.3 to make prediction on the test data. Evaluate

the performance of the algorithms on test data. Which models do you prefer? Are

there any suggestions to further improve the performance of the algorithms? Justify your

answers. (20 marks)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

College Algebra

Authors: Cynthia Y Young

4th Edition

★★★★★

Always have the dignity of the other or others as a backdrop.

Answered: 1 week ago

Previous Question Next Question