Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Complete the 3 tasks using python for a sample database in pandas named dataframe. Task 1: Data preprocessing [5 points] We are going to use
Complete the 3 tasks using python for a sample database in pandas named "dataframe".
Task 1: Data preprocessing [5 points] We are going to use the Decision Tree Classifier and Random Forest from Scikit-Learn. One limitation of this classifier is that does not accept non-numeric values which means the categorical attributes cannot be used as-is. One way to overcome this is to transform the feature space, making one binary-valued feature out of each value of the categorical features, while keeping the numeric features intact. When transforming the original data records, numeric (i.e. continuous) features remain unchanged, and each of the binary-valued features that replace the categorical features is set to 1 or 0 depending on whether the original categorical feature takes the relevant value. In Pandas, get_dummies(data) automatically converts all of the categorical values into binary features and returns the new data frame. You might consider saving the output to a CSV file. You should verify that the number of attributes in the transformed data frame match the total number of numeric features and the unique values of all the different categorical attributes. For the target attribute, you can pick "Survived = Yes." Now randomly split the data into training and test set (30%). Use the last three digits of your student ID as the seed for the random assignment. Task 1: Data preprocessing [5 points] We are going to use the Decision Tree Classifier and Random Forest from Scikit-Learn. One limitation of this classifier is that does not accept non-numeric values which means the categorical attributes cannot be used as-is. One way to overcome this is to transform the feature space, making one binary-valued feature out of each value of the categorical features, while keeping the numeric features intact. When transforming the original data records, numeric (i.e. continuous) features remain unchanged, and each of the binary-valued features that replace the categorical features is set to 1 or 0 depending on whether the original categorical feature takes the relevant value. In Pandas, get_dummies(data) automatically converts all of the categorical values into binary features and returns the new data frame. You might consider saving the output to a CSV file. You should verify that the number of attributes in the transformed data frame match the total number of numeric features and the unique values of all the different categorical attributes. For the target attribute, you can pick "Survived = Yes." Now randomly split the data into training and test set (30%). Use the last three digits of your student ID as the seed for the random assignmentStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started