Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset
Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review] We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam or not. Each row contains the word frequency for 54 words plus statistics on the longest "run" of captial letters. Word frequency is given by: fi = m/N Where f; is the frequency for word i, m; is the number of times word i appears in the email, and N is the total number of words in the email. We will use decision trees to classify the emails. Part A [5 points]: Complete the function get_spam_dataset to read in values from the dataset and split the data into train and test sets. In [ ]: def get_spam_dataset (filepath="data/spamdata.csv", test_split-0.1): get_spam_dataset Loads csv file located at "filepath". Shuffles the data and splits it so that the you have (1-test_split)*100% training examples and (test_split)*100% testing examples. Args: filepath: location of the csv file test_split: percentage/100 of the data should be the testing split Returns: x_train, x_test, y_train, y_test, feature_names Note: feature_names is a list of all column names including isspam. (in that order) first four are np.ndarray # your code here return 0 In [ ]: # TO-DO: import the data set into five variables: x_train, x_test, y_train, y_test, Label_names # Uncomment and edit the Line below to complete this task. test_split = 0.1 # default test_split; change it if you'd Like; ensure that this variable is used as an argument to your functio # your code here # X_train, x_test, y_train, y_test, Label_names = np.arange (5) In [ ]: # tests X_train, x_test, y_train, y_test, and Label_names
Step by Step Solution
There are 3 Steps involved in it
Step: 1
In import numpy as np Load the spam dataset using the getspamdataset function def getsp...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started