Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Complete the 3 tasks using python for a sample database in pandas named dataframe. Task 1: Data preprocessing [5 points] We are going to use

Complete the 3 tasks using python for a sample database in pandas named "dataframe".

image text in transcribed

Task 1: Data preprocessing [5 points] We are going to use the Decision Tree Classifier and Random Forest from Scikit-Learn. One limitation of this classifier is that does not accept non-numeric values which means the categorical attributes cannot be used as-is. One way to overcome this is to transform the feature space, making one binary-valued feature out of each value of the categorical features, while keeping the numeric features intact. When transforming the original data records, numeric (i.e. continuous) features remain unchanged, and each of the binary-valued features that replace the categorical features is set to 1 or 0 depending on whether the original categorical feature takes the relevant value. In Pandas, get_dummies(data) automatically converts all of the categorical values into binary features and returns the new data frame. You might consider saving the output to a CSV file. You should verify that the number of attributes in the transformed data frame match the total number of numeric features and the unique values of all the different categorical attributes. For the target attribute, you can pick "Survived = Yes." Now randomly split the data into training and test set (30%). Use the last three digits of your student ID as the seed for the random assignment. Task 1: Data preprocessing [5 points] We are going to use the Decision Tree Classifier and Random Forest from Scikit-Learn. One limitation of this classifier is that does not accept non-numeric values which means the categorical attributes cannot be used as-is. One way to overcome this is to transform the feature space, making one binary-valued feature out of each value of the categorical features, while keeping the numeric features intact. When transforming the original data records, numeric (i.e. continuous) features remain unchanged, and each of the binary-valued features that replace the categorical features is set to 1 or 0 depending on whether the original categorical feature takes the relevant value. In Pandas, get_dummies(data) automatically converts all of the categorical values into binary features and returns the new data frame. You might consider saving the output to a CSV file. You should verify that the number of attributes in the transformed data frame match the total number of numeric features and the unique values of all the different categorical attributes. For the target attribute, you can pick "Survived = Yes." Now randomly split the data into training and test set (30%). Use the last three digits of your student ID as the seed for the random assignment

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Principles Programming And Performance

Authors: Patrick O'Neil

1st Edition

1558603921, 978-1558603929

More Books

Students also viewed these Databases questions

Question

What is polarization? Describe it with examples.

Answered: 1 week ago