Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this problem, you will develop a model to predict whether a person in the US Census earns more than $ 5 0 K or

In this problem, you will develop a model to predict whether a person in the US Census earns more than $50K or not. Consider Income as the target variable and include Age, MaritalStatus, Race, Sex, and WeeklyHours as predictors. We use the Census dataset for this. Use a QDA model. Use the previously created 5 folds for the cross-validation on the training set.
Calculate and show the confusion matrix for both the training and the test set. What is the performance with respect to qda_model <-
# specify that the model is a quadratic discriminant analysis
discrim_quad()%>%
# note: there are several potential engines for QDA, here we just use the default one
set_engine("MASS")%>%
# select the binary classification mode
set_mode("classification")
# then, let's put everything into a workflow
qda_workflow <- workflow()%>%
# add the recipe (data pre-processing)
add_recipe(model_recipe)%>%
# add the ML model
add_model(qda_model)
set.seed(1)
control <- control_resamples(save_pred = TRUE,
event_level = "second")
qda_fit <-
qda_workflow %>%
fit(data = data_train)
# investigate the result
qda_fit
# to get the evaluation metrics for the test data:
qda_final_fit <-
qda_workflow %>%
last_fit(data_split) # with the fit function, we train the model on the training data
# note that we use the test data here!
test_predictions_qda <-
qda_final_fit %>%
augment()
test_predictions_qda$Income <- as.factor(test_predictions_qda$Income)
# note: you need to select the truth and estimate variables based on the column names of the test object
classification_metrics(data = test_predictions_qda,
truth = Income,
estimate =.pred_class,
`.pred_>50K`, # use the second outcome (Yes) as the level of interest
event_level = 'second') # note: the "second" indicates that we use the second class (AHD = Yes) as the level of interest
# finally, let's create the confusion matrix and ROC curve
confusionMatrix(data = test_predictions_qda$.pred_class,
reference = test_predictions_qda[[target_var]],
positive = positive_class)
two_class_curve_test_qda <- roc_curve(data = test_predictions_qda,
truth = Income,
`.pred_>50K`,
event_level = 'second')
autoplot(two_class_curve_test_qda), sensitivity, and specificity, and AUC? Create and print the ROC curves. I am using a five fold cross validation.
******How would I create the confusion matrix for the training set?************

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M. Kroenke

1st Edition

0130086509, 978-0130086501

More Books

Students also viewed these Databases questions

Question

What does Processing of an OLAP Cube accomplish?

Answered: 1 week ago