Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The spam datafile contains 4601 emails, 1813 of which are spam. The file has 57 features that include indicators for the presence of 54

 

The spam datafile contains 4601 emails, 1813 of which are spam. The file has 57 features that include indicators for the presence of 54 keywords (e.g. free, deal, ! etc), counts for capitalized characters etc., and a numeric spam variable for whether each email is tagged as spam by a human reader (spam column is 1 for spam, O for important emails). You have to predict the probability that a message is spam or not. 1) Partition the data into a training set (with 70% of the observations), and testing set (with 30% of the observations) using the random state of 12345 for cross validation. (10 points) 2) On the partitioned data, build the best KNN model. Show the accuracy numbers. (Hint: What is the best value of k? How do you decide the 'best k'?) (15 points) 3) On the partitioned data, build the best logistic regression model. Show the accuracy numbers. (15 points) 4) Based on the results of k-nearest neighbor, and logistic regression, what is the best model to classify the data? Provide explanation to support your argument. (10 points) A

Step by Step Solution

There are 3 Steps involved in it

Step: 1

1 Partitioning the data into a training set and testing set python import pandas as pd from sklearnm... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Income Tax Fundamentals 2013

Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill

31st Edition

1111972516, 978-1285586618, 1285586611, 978-1285613109, 978-1111972516

More Books

Students also viewed these Programming questions