Question

1 Approved Answer

Posted on Sep 25, 2024

Complete the 3 tasks using python for a sample database in pandas named dataframe. Task 1: Data preprocessing [5 points] We are going to use

Complete the 3 tasks using python for a sample database in pandas named "dataframe".

image text in transcribed

Task 1: Data preprocessing [5 points] We are going to use the Decision Tree Classifier and Random Forest from Scikit-Learn. One limitation of this classifier is that does not accept non-numeric values which means the categorical attributes cannot be used as-is. One way to overcome this is to transform the feature space, making one binary-valued feature out of each value of the categorical features, while keeping the numeric features intact. When transforming the original data records, numeric (i.e. continuous) features remain unchanged, and each of the binary-valued features that replace the categorical features is set to 1 or 0 depending on whether the original categorical feature takes the relevant value. In Pandas, get_dummies(data) automatically converts all of the categorical values into binary features and returns the new data frame. You might consider saving the output to a CSV file. You should verify that the number of attributes in the transformed data frame match the total number of numeric features and the unique values of all the different categorical attributes. For the target attribute, you can pick "Survived = Yes." Now randomly split the data into training and test set (30%). Use the last three digits of your student ID as the seed for the random assignment. Task 1: Data preprocessing [5 points] We are going to use the Decision Tree Classifier and Random Forest from Scikit-Learn. One limitation of this classifier is that does not accept non-numeric values which means the categorical attributes cannot be used as-is. One way to overcome this is to transform the feature space, making one binary-valued feature out of each value of the categorical features, while keeping the numeric features intact. When transforming the original data records, numeric (i.e. continuous) features remain unchanged, and each of the binary-valued features that replace the categorical features is set to 1 or 0 depending on whether the original categorical feature takes the relevant value. In Pandas, get_dummies(data) automatically converts all of the categorical values into binary features and returns the new data frame. You might consider saving the output to a CSV file. You should verify that the number of attributes in the transformed data frame match the total number of numeric features and the unique values of all the different categorical attributes. For the target attribute, you can pick "Survived = Yes." Now randomly split the data into training and test set (30%). Use the last three digits of your student ID as the seed for the random assignment