Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Implement a Naive Bayes classification naiveBayes _ classify ( word _ probs, message ) for classifying an email message into spam or non - spam
Implement a Naive Bayes classification naiveBayesclassifywordprobs, message for classifying an email message into spam or nonspam by using the word probability distributions, wordprobs, learned from a set of training data.
In this question, you are asked to implement the Naive Bayes method from scratch by implementing the following functions. To simplify the implementation, we assume that any message is equally likely to be spam or notspam.
tokenizemessage: extracts a set of unique words from the given text message.
countwordstrainingset: creates a dictionary containing the mappings from unique words to the frequencies of the words in spam and nonspam messages in the training set
wordprobabilitiescounts totalspams, totalnonspams, k: turns the wordcounts into a list of triplets w pw spam and pw ~spam
spamprobabilitywordprobs, message, totalspams, totalnonspams, k : computes the probablity of spam for the given message.
naiveBayesclassifywordprobs, message, totalspams, totalnonspams, k: classifies the message as spam or ham
Using the data set spam.csv to evaluate the classification in terms of accuracy, recall, precision, and Fscore.
from collections import Counter, defaultdict
import math,re
def tokenizemessage:
extracts the set of unique words from the given text message
INPUT:
message: a piece of text
OUTPUT:
a set of unique words
message message.lower # convert to lowercase
allwords refindallaz message # extract the words
return setalldef countwordstrainingset:
creates a dictionary containing the mappings from unique words to the frequencies of the words in
spam and nonspam messages in the training set
INPUT:
trainingset: training set consists of pairs message isspam
OUTPUT:
a map from unique words to their frequencies in spam and nonspam messages
counts defaultdictlambda:
for message, isspam in trainingset:
for word in tokenizemessage:
countsword if isspam else
return counts
words # remove duplicates
def countwordstrainingset:
creates a dictionary containing the mappings from unique words to the frequencies of the words in
spam and nonspam messages in the training set
INPUT:
trainingset: training set consists of pairs message isspam
OUTPUT:
a map from unique words to their frequencies in spam and nonspam messages
counts defaultdictlambda:
for message, isspam in trainingset:
for word in tokenizemessage:
countsword if isspam else
return counts
counts defaultdictlambda:
countswins
countswins
countswins
def wordprobabilitiescounts totalspams, totalnonspams, k:
turns the wordcounts into a list of triplets w pw spam and pw ~spam
INPUT:
counts: a maps from unique words to their frequencies in spam and nonspam messages
totalspams: the total number of spam messages
totalnonspams: the total number of nonspam messages
k: the smoothing parameter, default
OUTPUT:
a list of triples w pwspam pwnonspam
return w
spam ktotalspams k
nonspam ktotalnonspams k
for wspam nonspam in counts.items
def spamprobabilitywordprobs, message, totalspams, totalnonspams, k :
computes the probablity of spam for the given message
INPUT:
wordprobs: a list of triple w pwspam pwnonspam
message: a message under classification
OUTPUT:
the probability of being spam for the message
HINTS:
First, get a set of unique words in the mesage.
Second, sum up all the log probabilities of the unique words in the message.
Third, get probabilities by taking exponentials of the probabilites for spam and nonspam
Finally, return the ratio of probability of spam over the sum of the probabiliy of spam and the
probability of not spam.
############YOUR CODE HERE##################
return probspam probspam probham
################
def naiveBayesclassifywordprobs, message, totalspams, totalnonspams, k:
classifies the message as spam or ham
INPUT:
wordprobs: a list of triples w pwspam pwnonspam
message: the message under classifiation
OUTPUT:
'spam' or 'ham' indicating the classification of the message.
MUST WORK WITH THE FOLLOWING STATEMENTS
from sklearn.metrics import classificationreport
printclassificationreportytest, ypred
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started