Question: This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what
This exercise concerns the classification of spam email. Create a corpus of spam email and one of non-spam mail. Examine each corpus and decide what features appear to be useful for classification: unigram words? bigrams? message length, sender, time of arrival? Then train a classification algorithm (decision tree, naive Bayes, SVM, logistic regression, or some other algorithm of your choosing) on a training set and report its accuracy on a test set.
Step by Step Solution
3.38 Rating (160 Votes )
There are 3 Steps involved in it
To produce a corpus of spam emails and nonspam emails we can either collect emails ourselves or use being datasets There are several intimately availa... View full answer
Get step-by-step solutions from verified subject matter experts
