Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 27, 2024

Naive Bayes for Spam Filtering In this problem, we will use the Naive Bayes algorithm to fit a spam filter by hand. All derivations must

Naive Bayes for Spam Filtering In this problem, we will use the Naive Bayes algorithm to fit a spam filter by hand. All derivations must be completed by hand (i.e. this problem should not be solved using software such as Python or R).

Spam filters are used in all email services to classify received emails as 'Spam' or 'Not Spam.' A simple approach involves maintaining a vocabulary of words that commonly occur in 'Spam' emails and classifying them an email as 'Spam' if the number of words from the dictionary that are present in the email is over a certain threshold. We are given 15 vocabulary words: V = {secret, offer, low, price, valued, customer, today, dollar, million, sports, is, for, play, healthy, pizza}

We will useVi to represent the ith word in V. AS our training dataset, we are also given 3 example spam messages,

1. million dollar offer for today

2. secret offer today

3. secret is secret

and 4 example non-spam messages

1. low price for valued customer

2. play secret sports today

3. sports is healthy

4. low price pizza

Recall that the Naive Bayes classifier assumes the probability of an input depends on its input feature. The feature for each sample is defined asxi=[xi1,xi2,...xid]T,i=1,...,m and the class of the ith sample is yi. In our case the length of the input vector is d = 15, which is equal to the number of words in the vocabulary V . Each entryxij is equal to the number of times word Vioccurs in the i-th message. 1. Calculate class prior P(y = 0) and P(y = 1) from the training data, where y = 0 corresponds to spam messages, and y = 1 corresponds to non-spam messages. Note that these class prior essentially corresponds to the frequency of each class in the training sample. Write down the feature vectors for each spam and non-spam messages.

2. In the Naive Bayes model, assuming the keywords are independent of each other (this is a simplification), the likelihood of a sentence with its feature vector x given a class c is given by

P(xy=c)=k=1dxk,c=0,1

where 0 <=c,k <= 1 is the probability of word k appearing in class c, which satisfies k=1dc,k=1,c=0,1

Given this, the complete log-likelihood function for our training data is given by

l(0,1...,0,d,1,1,...,1,d)=i=1mk=1dxiklogyi,k

In this example, m = 7. Calculate the maximum likelihood estimates of0,1,0,7,1,1,1,15 by maximizing the log-likelihood function above.

(hint: we are solving a constrained maximization problem and you will need to introduce Lagrangian multipliers and consider the Lagrangianfunction above.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Engineering Mathematics

Advanced Engineering Mathematics

Authors: Erwin Kreyszig

3rd Edition

471507288, 978-0471507284

More Books

Students also viewed these Mathematics questions

Question

★★★★★

Suppose that a nonnegative function y = (x) has a continuous first derivative on [a, b]. Let C be the boundary of the region in the xy-plane that is bounded below by the x-axis, above by the graph of...

Answered: 1 week ago

Question

★★★★★

During the year, Samuels Company reported net income of $300,000, including amortization of intangible assets of $66,000, depreciation of plant assets of $132,000, and amortization of premium on...

Answered: 1 week ago

Question

★★★★★

1. What are some of the most significant trends and changes that you observe in society and the community, work and the global market? How can you personally deal with such changes?

Answered: 1 week ago

Question

★★★★★

Do they require entrance exams (e.g., the GRE)? If so, what are the minimum scores? More importantly, what are the average scores of recent students?

Answered: 1 week ago

Question

★★★★★

Pat Koontz makes necklaces from glass beads, metal beads, and natural beads. After reading about hybrid costing, she realized that the different types of necklaces did not cost the same amount of...

Answered: 1 week ago

Question

★★★★★

Assume Gillette Corporation will pay an annual dividend of $0.66 one year from now. Analysts expect this dividend to grow at 12.3% per year thereafter until the sixth year. Thereafter, growth will...

Answered: 1 week ago

Question

★★★★★

Khan Limited is a publicly traded company on the Toronto Stock Exchange. The company sponsors a defined benefit pension plan for all of its employees, and the controller provides you with the...

Answered: 1 week ago

Question

★★★★★

write an EssayPrompt: Rhetorical Analysis Overview For this essay, you are being asked to rhetorically analyze an author's argument. the authors argument is Michelle Obama's response to Capitol Riot...

Answered: 1 week ago

Question

★★★★★

Why should a business be socially responsible?

Answered: 1 week ago

Question

★★★★★

Discuss the general principles of management given by Henri Fayol

Answered: 1 week ago

Question

★★★★★

Explain the importance of effective supervision in a banking network

Answered: 1 week ago

Question

★★★★★

Detailed note on the contributions of F.W.Taylor

Answered: 1 week ago

Question

★★★★★

Discuss briefly about S N 1 (CB) mechanism for the reaction: [Co(en) 2 NH 3 Cl] 2+ + OH - ? [Co(en) 2 NH 3 OH] 2+ + Cl -

Answered: 1 week ago

Question

★★★★★

The answer to the NPV that I am getting is $22,761.57. But the system says this is not correct, can someone please lmk what the correct answer is and why ? CSM Machine Shop is considering a four-year...

Answered: 1 week ago

Question

★★★★★

Hardin Services Co. experienced the following events in 2016: 1. Provided services on account. 2. Collected cash for accounts receivable. 3. Attempted to collect an account and, when unsuccessful,...

Answered: 1 week ago

Question

★★★★★

17-6. How can brand managers use YouTube to converse with customers? What is a new form of mobile marketing using online video?

Answered: 1 week ago

Question

★★★★★

17-2. In classifying social media, what do we mean by (a) media richness and (b) self-disclosure?

Answered: 1 week ago

Question

★★★★★

Identify the cause of the convergence of the real and digital worlds and how this will affect the future of social media.

Answered: 1 week ago

Previous Question Next Question