Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

In this problem, you will develop a model to predict whether a person in the US Census earns more than $ 5 0 K or

In this problem, you will develop a model to predict whether a person in the US Census earns more than $

50

K or not. Consider Income as the target variable and include Age, MaritalStatus, Race, Sex, and WeeklyHours as predictors. We use the Census dataset for this. Use a QDA model. Use the previously created

5

folds for the cross

-

validation on the training set.

Calculate and show the confusion matrix for both the training and the test set. What is the performance with respect to qda

_

model

< -

# specify that the model is a quadratic discriminant analysis

discrim

_

quad

() % > %

# note: there are several potential engines for QDA, here we just use the default one

set

_

engine

("

MASS

") % > %

# select the binary classification mode

set

_

mode

("

classification

")

# then, let's put everything into a workflow

qda

_

workflow

< -

workflow

() % > %

# add the recipe

(

data pre

-

processing

)

add

_

recipe

(

model

_

recipe

) % > %

# add the ML model

add

_

model

(

qda

_

model

)

set.seed

(1)

control

< -

control

_

resamples

(

save

_

pred

=

TRUE,

event

_

level

=

"second"

)

qda

_

fit

< -

qda

_

workflow

% > %

fit

(

data

=

data

_

train

)

# investigate the result

qda

_

fit

# to get the evaluation metrics for the test data:

qda

_

final

_

fit

< -

qda

_

workflow

% > %

last

_

fit

(

data

_

split

)

# with the fit function, we train the model on the training data

# note that we use the test data here!

test

_

predictions

_

qda

< -

qda

_

final

_

fit

% > %

augment

()

test

_

predictions

_

qda$Income

< -

.

factor

(

test

_

predictions

_

qda$Income

)

# note: you need to select the truth and estimate variables based on the column names of the test object

classification

_

metrics

(

data

=

test

_

predictions

_

qda,

truth

=

Income,

estimate

= .

pred

_

class,

` .

pred

_> 50

`,

# use the second outcome

(

Yes

)

as the level of interest

event

_

level

=

'second'

)

# note: the "second" indicates that we use the second class

(

AHD

=

Yes

)

as the level of interest

# finally, let's create the confusion matrix and ROC curve

confusionMatrix

(

data

=

test

_

predictions

_

qda$

.

pred

_

class,

reference

=

test

_

predictions

_

qda

[[

target

_

var

]],

positive

=

positive

_

class

)

two

_

class

_

curve

_

test

_

qda

< -

roc

_

curve

(

data

=

test

_

predictions

_

qda,

truth

=

Income,

` .

pred

_> 50

`,

event

_

level

=

'second'

)

autoplot

(

two

_

class

_

curve

_

test

_

qda

),

sensitivity

,

and specificity, and AUC? Create and print the ROC curves. I am using a five fold cross validation.

* * * * * *

How would I create the confusion matrix for the training set?

* * * * * * * * * * * *

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M. Kroenke

After designing a Multidimensional Database in Visual Studio, what are the next steps that build the Database in the Analysis Services Instance? How is the build out of the Analytical Services...

Answered: 1 week ago

Previous Question Next Question