Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Decision Tree, post - pruning and cost complexity parameter using sklearn 0 . 2 2 [ 1 0 points, Peer Review ] We will use
Decision Tree, postpruning and cost complexity parameter using sklearn points, Peer Review
We will use a preprocessed natural language dataset in the CSV file "spamdata.csv to classify emails as spam or not. Each row contains the word frequency for words plus statistics on the longest "run" of captial letters.
Word frequency is given by:
Where
is the frequency for word
is the number of times word
appears in the email, and
is the total number of words in the email.
We will use decision trees to classify the emails.
Part A points: Complete the function getspamdataset to read in values from the dataset and split the data into train and test sets.
My Code:
def getspamdatasetfilepath"dataspamdatacsv testsplit:
getspamdataset
Loads csv file located at "filepath". Shuffles the data and splits
it so that the you have testsplit training examples and
testsplit testing examples.
Args:
filepath: location of the csv file
testsplit: percentage of the data should be the testing split
Returns:
Xtrain, Xtest, ytrain, ytest, featurenames
Note: featurenames is a list of all column names including isSpam.
in that order
first four are npndarray
# your code here
# Read CSV file
data pdreadcsvfilepath headerNone, delimiter
# Shuffle the data
data data.samplefrac randomstateresetindexdropTrue
# Extract features and target variable
X data.iloc: :values
y data.iloc:values
# Split the data into train and test sets
Xtrain, Xtest, ytrain, ytest traintestsplitX y testsizetestsplit, randomstate
# Get feature names
featurenames fwordfreqi for i in range Xshape
return Xtrain, Xtest, ytrain, ytest, featurenames
# TODO: import the data set into five variables: Xtrain, Xtest, ytrain, ytest, labelnames
# Uncomment and edit the line below to complete this task.
testsplit # default testsplit; change it if you'd like; ensure that this variable is used as an argument to your function
# your code here
Xtrain, Xtest, ytrain, ytest, labelnames getspamdatasetfilepath"dataspamdatacsv testsplit
# Xtrain, Xtest, ytrain, ytest, labelnames nparange
# Print the shapes of Xtrain and ytrain
printShape of Xtrain:", Xtrain.shape
printShape of ytrain:", ytrain.shape
# Print labelnames
printLabel names:", labelnames
its returning wrong answer can someone help.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started