Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Starting R Code: library(titanic) # loads titanic_train data frame library(caret) library(tidyverse) library(rpart) # 3 significant digits options(digits = 3) # clean the data - `titanic_train`

Starting R Code:

library(titanic) # loads titanic_train data frame library(caret) library(tidyverse) library(rpart) # 3 significant digits options(digits = 3) # clean the data - `titanic_train` is loaded with the titanic package titanic_clean <- titanic_train %>% mutate(Survived = factor(Survived), Embarked = factor(Embarked), Age = ifelse(is.na(Age), median(Age, na.rm = TRUE), Age), # NA age to median age FamilySize = SibSp + Parch + 1) %>% # count family members select(Survived, Sex, Pclass, Age, Fare, SibSp, Parch, FamilySize, Embarked)

Question 1: Training and test sets

Split titanic_clean into test and training sets - after running the setup code, it should have 891 rows and 9 variables.

Set the seed to 42, then use the caret package to create a 20% data partition based on the Survived column. Assign the 20% partition to test_set and the remaining 80% partition to train_set.

How many observations are in the training set?

How many observations are in the test set?

What proportion of individuals in the training set survived?

Question 2: Baseline prediction by guessing the outcome

The simplest prediction method is randomly guessing the outcome without using additional predictors. These methods will help us determine whether our machine learning algorithm performs better than chance. How accurate are two methods of guessing Titanic passenger survival?

Set the seed to 3. For each individual in the test set, randomly guess whether that person survived or not by sampling from the vector c(0,1) (Note: use the default argument setting of prob from the sample function).

What is the accuracy of this guessing method?

Question 3a: Predicting survival by sex

Use the training set to determine whether members of a given sex were more likely to survive or die. Apply this insight to generate survival predictions on the test set.

What proportion of training set females survived?

What proportion of training set males survived?

Question 3b: Predicting survival by sex

Predict survival using sex on the test set: if the survival rate for a sex is over 0.5, predict survival for all individuals of that sex, and predict death if the survival rate for a sex is under 0.5.

What is the accuracy of this sex-based prediction method on the test set?

Question 4a: Predicting survival by passenger class

In the training set, which class(es) (Pclass) were passengers more likely to survive than die?

Select ALL that apply.

A. 1 B. 2

C. 3

Question 4b: Predicting survival by passenger class

Predict survival using passenger class on the test set: predict survival if the survival rate for a class is over 0.5, otherwise predict death.

What is the accuracy of this class-based prediction method on the test set?

Question 4c: Predicting survival by passenger class

Use the training set to group passengers by both sex and passenger class.

Which sex and class combinations were more likely to survive than die (i.e. >50% survival)?

Select ALL that apply.

a.female 1st class

b.female 2nd class

c.female 3rd class

d.male 1st class

e.male 2nd class

f.male 3rd class

Question 5a: Confusion matrix

Use the confusionMatrix() function to create confusion matrices for the sex model, class model, and combined sex and class model. You will need to convert predictions and survival status to factors to use this function.

What is the "positive" class used to calculate confusion matrix metrics

a. 0

b. 1

Which model has the highest sensitivity?

a sex only

b class only

c sex and class combined

Which model has the highest specificity?

a sex only

b class only

c sex and class combined

Which model has the highest balanced accuracy?

a sex only

b class only

c sex and class combined

Question 5b: Confusion matrix

What is the maximum value of balanced accuracy from Q5a?

Question 6: F1 scores

Use the F_meas() function to calculate scores for the sex model, class model, and combined sex and class model. You will need to convert predictions to factors to use this function.

Which model has the highest score?

a sex only

b class only

c sex and class combined

What is the maximum F1 value of the score?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Calculus

Authors: Carla C Morris, Robert M Stark

1st Edition

1119015367, 9781119015369

More Books

Students also viewed these Mathematics questions

Question

1. Show enthusiasm for the subject you teach.

Answered: 1 week ago

Question

Define human resource management.

Answered: 1 week ago