Question
THIS IS R PROGRAMMING - Use thetitanic_traindata frame from the titanic library as the starting point for this project. library(titanic) # loads titanic_train data frame
THIS IS R PROGRAMMING - Use thetitanic_traindata frame from thetitaniclibrary as the starting point for this project.
library(titanic) # loads titanic_train data frame
library(caret)
library(tidyverse)
library(rpart)
# 3 significant digits
options(digits = 3)
# clean the data - `titanic_train` is loaded with the titanic package
titanic_clean <- titanic_train %>%
mutate(Survived = factor(Survived),
Embarked = factor(Embarked),
Age = ifelse(is.na(Age), median(Age, na.rm = TRUE), Age), # NA age to median age
FamilySize = SibSp + Parch + 1) %>% # count family members
select(Survived, Sex, Pclass, Age, Fare, SibSp, Parch, FamilySize, Embarked)
Splittitanic_cleaninto test and training sets - after running the setup code, it should have 891 rows and 9 variables.
Set the seed to 42, then use thecaretpackage to create a 20% data partition based on theSurvivedcolumn. Assign the 20% partition totest_setand the remaining 80% partition totrain_set.
How many observations are in the training set? _________
How many observations are in the test set? ___________
What proportion of individuals in the training set survived? ________
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started