Question
Naive Bayes for Spam Filtering In this problem, we will use the Naive Bayes algorithm to fit a spam filter by hand. All derivations must
Naive Bayes for Spam Filtering In this problem, we will use the Naive Bayes algorithm to fit a spam filter by hand. All derivations must be completed by hand (i.e. this problem should not be solved using software such as Python or R).
Spam filters are used in all email services to classify received emails as 'Spam' or 'Not Spam.' A simple approach involves maintaining a vocabulary of words that commonly occur in 'Spam' emails and classifying them an email as 'Spam' if the number of words from the dictionary that are present in the email is over a certain threshold. We are given 15 vocabulary words: V = {secret, offer, low, price, valued, customer, today, dollar, million, sports, is, for, play, healthy, pizza}
We will useVi to represent the ith word in V. AS our training dataset, we are also given 3 example spam messages,
1. million dollar offer for today
2. secret offer today
3. secret is secret
and 4 example non-spam messages
1. low price for valued customer
2. play secret sports today
3. sports is healthy
4. low price pizza
Recall that the Naive Bayes classifier assumes the probability of an input depends on its input feature. The feature for each sample is defined asxi=[xi1,xi2,...xid]T,i=1,...,m and the class of the ith sample is yi. In our case the length of the input vector is d = 15, which is equal to the number of words in the vocabulary V . Each entryxij is equal to the number of times word Vioccurs in the i-th message. 1. Calculate class prior P(y = 0) and P(y = 1) from the training data, where y = 0 corresponds to spam messages, and y = 1 corresponds to non-spam messages. Note that these class prior essentially corresponds to the frequency of each class in the training sample. Write down the feature vectors for each spam and non-spam messages.
2. In the Naive Bayes model, assuming the keywords are independent of each other (this is a simplification), the likelihood of a sentence with its feature vector x given a class c is given by
P(xy=c)=k=1dxk,c=0,1
where 0 <=c,k <= 1 is the probability of word k appearing in class c, which satisfies k=1dc,k=1,c=0,1
Given this, the complete log-likelihood function for our training data is given by
l(0,1...,0,d,1,1,...,1,d)=i=1mk=1dxiklogyi,k
In this example, m = 7. Calculate the maximum likelihood estimates of0,1,0,7,1,1,1,15 by maximizing the log-likelihood function above.
(hint: we are solving a constrained maximization problem and you will need to introduce Lagrangian multipliers and consider the Lagrangianfunction above.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started