Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset

image

Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam or not. Each row contains the word frequency for 54 words plus statistics on the longest "run" of captial letters. Word frequency is given by: fi = m/N Where f; is the frequency for word i, m; is the number of times word i appears in the email, and N is the total number of words in the email. We will use decision trees to classify the emails. Part A [5 points]: Complete the function get_spam_dataset to read in values from the dataset and split the data into train and test sets. In [ ]: def get_spam_dataset (filepath="data/spamdata.csv", test_split-0.1): get_spam_dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have (1-test_split)*100% training examples and (test_split)*100% testing examples. Args: filepath: location of the csv file test_split: percentage/100 of the data should be the testing split Returns: x_train, x_test, y_train, y_test, feature_names Note: feature_names is a list of all column names including isspam. (in that order) first four are np.ndarray # your code here return 0 In [ ]: # TO-DO: import the data set into five variables: x_train, x_test, y_train, y_test, Label_names # Uncomment and edit the Line below to complete this task. test_split = 0.1 # default test_split; change it if you'd Like; ensure that this variable is used as an argument to your functio # your code here # X_train, x_test, y_train, y_test, Label_names = np.arange (5) In [ ]: # tests X_train, x_test, y_train, y_test, and Label_names

Step by Step Solution

There are 3 Steps involved in it

Step: 1

In import numpy as np Load the spam dataset using the getspamdataset function def getsp... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Numerical Methods With Chemical Engineering Applications

Authors: Kevin D. Dorfman, Prodromos Daoutidis

1st Edition

1107135117, 978-1107135116

More Books

Students also viewed these Programming questions

Question

12.1 Describe issues concerning defining normality across cultures.

Answered: 1 week ago