Question
4. With the given dataset in a csv file, named titanic.csv: The data file contains data for 892 of the real Titanic passengers. Each row
4. With the given dataset in a csv file, named titanic.csv: The data file contains data for 892 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including whether they survived (S), their age (A), their passenger-class (C), their sex (G) and the fare they paid (X). Please finish the following tasks: 1. Load the data, covert your data to pandas DataFrame. 2. Clean your data (Please only consider the missing values). 3. Encode your data (hint: male -> 0, female -> 1) 4. Split the data into training and testing sets. 5. Use Logistic() to train your model. 6. Use your trained model to classify for test dataset. 7. Evaluate the performance of the model. 8. Print the equation of your regression 9. Interpret your evaluation.
5. Continue with the Question 4 (copy and create a new file), finish the following task: 1. Chose DecisionTree() to train your model, and then repeat the question 4 step 6-9. (Step 8: print the textual tree) 2. Extra credit: Plot the tree. 3. Compare the performance of the result vs. the performance of Question 4. Interpret it. 4. Extra credit: Prune the tree, find the tree that balance between complexity to the performance.
6. Continue with the Question 5 (copy and create a new file), finish the following task: 1. Choose RandomForest() to train your model, and then repeat the question 4 step 6-9. (Skip step 8) 2. Compare the performance of the result vs. the performance of Question 5. Interpret it.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started