Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The spam data file contains 4601 emails, 1813 of which are spam. The file has 57 features that include indicators for the presence of 54

  1. The spam data file contains 4601 emails, 1813 of which are spam. The file has 57 features that include indicators for the presence of 54 keywords (e.g. free, deal, ! etc), counts for capitalized characters, etc., and a numeric spam variable for whether each email is tagged as spam by a human reader (spam column is 1 for spam, 0 for important emails).
  2.  
  3.  You have to predict the probability that a message is spam or not. 
  4.  
  5. 1) Partition the data into a training set (with 70% of the observations), and a testing set (with 30% of the observations) using the random state of 12345 for cross-validation. 
  6.  
  7. 2) On the partitioned data, build the best KNN model. Show the accuracy numbers. (Hint: What is the best value of k? How do you decide the 'best k'?) 
  8.  
  9. 3) On the partitioned data, build the best logistic regression model. Show the accuracy numbers. 
  10.  
  11. 4) Based on the results of k-nearest neighbor, and logistic regression, what is the best model to classify the data? Provide an explanation to support your argument

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Database Management

Authors: Jeff Hoffer, Ramesh Venkataraman, Heikki Topi

12th edition

133544613, 978-0133544619

More Books

Students also viewed these Databases questions

Question

1. Letters and diaries in history.

Answered: 1 week ago

Question

How effectively they and their people please customers

Answered: 1 week ago