Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 07, 2024

Implement a Naive Bayes classification naiveBayes _ classify ( word _ probs, message ) for classifying an email message into spam or non - spam

Implement a Naive Bayes classification naiveBayes

_

classify

(

word

_

probs, message

)

for classifying an email message into spam or non

-

spam by using the word probability distributions, word

_

probs, learned from a set of training data.

In this question, you are asked to implement the Naive Bayes method from scratch by implementing the following functions. To simplify the implementation, we assume that any message is equally likely to be spam or not

-

spam.

tokenize

(

message

)

: extracts a set of unique words from the given text message.

count

_

words

(

training

_

set

)

: creates a dictionary containing the mappings from unique words to the frequencies of the words in spam and non

-

spam messages in the training set

word

_

probabilities

(

counts

,

total

_

spams, total

_

non

_

spams, k

= 0.5)

: turns the word

_

counts into a list of triplets w

,

(

|

spam

)

and p

(

|

~spam

)

spam

_

probability

(

word

_

probs, message, total

_

spams, total

_

non

_

spams, k

= 0.5)

: computes the probablity of spam for the given message.

naiveBayes

_

classify

(

word

_

probs, message, total

_

spams, total

_

non

_

spams, k

)

: classifies the message as spam or ham

Using the data set spam.csv to evaluate the classification in terms of accuracy, recall, precision, and F

1 -

score.

from collections import Counter, defaultdict

import math,re

def tokenize

(

message

)

" " "

extracts the set of unique words from the given text message

INPUT:

message: a piece of text

OUTPUT:

a set of unique words

" " "

message

=

message.lower

()

# convert to lowercase

all

_

words

=

.

findall

(" [

-

0 - 9'] + ",

message

)

# extract the words

return set

(

alldef count

_

words

(

training

_

set

)

" " "

creates a dictionary containing the mappings from unique words to the frequencies of the words in

spam and non

-

spam messages in the training set

INPUT:

training

_

set: training set consists of pairs

(

message

,

_

spam

)

OUTPUT:

a map from unique words to their frequencies in spam and non

-

spam messages

" " "

counts

=

defaultdict

(

lambda:

[0, 0])

for message, is

_

spam in training

_

set:

for word in tokenize

(

message

)

counts

[

word

] [0

if is

_

spam else

1] + = 1

return counts

_

words

)

# remove duplicates

def count

_

words

(

training

_

set

)

" " "

creates a dictionary containing the mappings from unique words to the frequencies of the words in

spam and non

-

spam messages in the training set

INPUT:

training

_

set: training set consists of pairs

(

message

,

_

spam

)

OUTPUT:

a map from unique words to their frequencies in spam and non

-

spam messages

" " "

counts

=

defaultdict

(

lambda:

[0, 0])

for message, is

_

spam in training

_

set:

for word in tokenize

(

message

)

counts

[

word

] [0

if is

_

spam else

1] + = 1

return counts

counts

=

defaultdict

(

lambda:

[0, 0])

counts

["

wins

"] [0] = 50

counts

["

wins

"] [1] = 500

counts

["

wins

"]

def word

_

probabilities

(

counts

,

total

_

spams, total

_

non

_

spams, k

= 0.5)

" " "

turns the word

_

counts into a list of triplets w

,

(

|

spam

)

and p

(

|

~spam

)

INPUT:

counts: a maps from unique words to their frequencies in spam and non

-

spam messages

total

_

spams: the total number of spam messages

total

_

non

_

spams: the total number of non

-

spam messages

= 0.5

: the smoothing parameter, default

0.5

OUTPUT:

a list of triples

(

,

(

|

spam

),

(

|

non

-

spam

))

" " "

return

[(

,

(

spam

+

) / (

total

_

spams

+ 2 *

),

(

non

_

spam

+

) / (

total

_

non

_

spams

+ 2 *

))

for w

, (

spam

,

non

_

spam

)

in counts.items

()]

def spam

_

probability

(

word

_

probs, message, total

_

spams, total

_

non

_

spams, k

= 0.5)

" " "

computes the probablity of spam for the given message

INPUT:

word

_

probs: a list of triple

(

,

(

|

spam

),

(

|

non

-

spam

))

message: a message under classification

OUTPUT:

the probability of being spam for the message

HINTS:

First, get a set of unique words in the mesage.

Second, sum up all the log probabilities of the unique words in the message.

Third, get probabilities by taking exponentials of the probabilites

(

for spam and non

-

spam

) .

Finally, return the ratio of probability of spam over the sum of the probabiliy of spam and the

probability of not spam.

" " "

############YOUR CODE HERE##################

return prob

_

spam

/ (

prob

_

spam

+

prob

_

ham

)

################

def naiveBayes

_

classify

(

word

_

probs, message, total

_

spams, total

_

non

_

spams, k

)

" " "

classifies the message as spam or ham

INPUT:

word

_

probs: a list of triples

(

,

(

|

spam

),

(

|

non

-

spam

))

message: the message under classifiation

OUTPUT:

'spam' or 'ham' indicating the classification of the message.

" " "

MUST WORK WITH THE FOLLOWING STATEMENTS

from sklearn.metrics import classification

_

report

(

classification

_

report

(

_

test, y

_

pred

))

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Design Implementation And Management

Authors: Peter Rob, Carlos Coronel

6th International Edition

★★★★★

Do you post any personal information on social networking sites? What kind of information are you willing to reveal? What kind of information do you consider too private to share in mediated contexts?

Answered: 1 week ago

Previous Question Next Question