Question
Starting R Code: library(titanic) # loads titanic_train data frame library(caret) library(tidyverse) library(rpart) # 3 significant digits options(digits = 3) # clean the data - `titanic_train`
Starting R Code:
library(titanic) # loads titanic_train data frame library(caret) library(tidyverse) library(rpart) # 3 significant digits options(digits = 3) # clean the data - `titanic_train` is loaded with the titanic package titanic_clean <- titanic_train %>% mutate(Survived = factor(Survived), Embarked = factor(Embarked), Age = ifelse(is.na(Age), median(Age, na.rm = TRUE), Age), # NA age to median age FamilySize = SibSp + Parch + 1) %>% # count family members select(Survived, Sex, Pclass, Age, Fare, SibSp, Parch, FamilySize, Embarked)
Question 1: Training and test sets
Split titanic_clean into test and training sets - after running the setup code, it should have 891 rows and 9 variables.
Set the seed to 42, then use the caret package to create a 20% data partition based on the Survived column. Assign the 20% partition to test_set and the remaining 80% partition to train_set.
How many observations are in the training set?
How many observations are in the test set?
What proportion of individuals in the training set survived?
Question 2: Baseline prediction by guessing the outcome
The simplest prediction method is randomly guessing the outcome without using additional predictors. These methods will help us determine whether our machine learning algorithm performs better than chance. How accurate are two methods of guessing Titanic passenger survival?
Set the seed to 3. For each individual in the test set, randomly guess whether that person survived or not by sampling from the vector c(0,1) (Note: use the default argument setting of prob from the sample function).
What is the accuracy of this guessing method?
Question 3a: Predicting survival by sex
Use the training set to determine whether members of a given sex were more likely to survive or die. Apply this insight to generate survival predictions on the test set.
What proportion of training set females survived?
What proportion of training set males survived?
Question 3b: Predicting survival by sex
Predict survival using sex on the test set: if the survival rate for a sex is over 0.5, predict survival for all individuals of that sex, and predict death if the survival rate for a sex is under 0.5.
What is the accuracy of this sex-based prediction method on the test set?
Question 4a: Predicting survival by passenger class
In the training set, which class(es) (Pclass) were passengers more likely to survive than die?
Select ALL that apply.
A. 1 B. 2
C. 3
Question 4b: Predicting survival by passenger class
Predict survival using passenger class on the test set: predict survival if the survival rate for a class is over 0.5, otherwise predict death.
What is the accuracy of this class-based prediction method on the test set?
Question 4c: Predicting survival by passenger class
Use the training set to group passengers by both sex and passenger class.
Which sex and class combinations were more likely to survive than die (i.e. >50% survival)?
Select ALL that apply.
a.female 1st class
b.female 2nd class
c.female 3rd class
d.male 1st class
e.male 2nd class
f.male 3rd class
Question 5a: Confusion matrix
Use the confusionMatrix() function to create confusion matrices for the sex model, class model, and combined sex and class model. You will need to convert predictions and survival status to factors to use this function.
What is the "positive" class used to calculate confusion matrix metrics
a. 0
b. 1
Which model has the highest sensitivity?
a sex only
b class only
c sex and class combined
Which model has the highest specificity?
a sex only
b class only
c sex and class combined
Which model has the highest balanced accuracy?
a sex only
b class only
c sex and class combined
Question 5b: Confusion matrix
What is the maximum value of balanced accuracy from Q5a?
Question 6: F1 scores
Use the F_meas() function to calculate scores for the sex model, class model, and combined sex and class model. You will need to convert predictions to factors to use this function.
Which model has the highest score?
a sex only
b class only
c sex and class combined
What is the maximum F1 value of the score?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started