Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Show Work Please!! Email spam filtering models often use a bag-of-words representation for emails. In a bag-of-words representation, the descriptive features that describe a document

Show Work Please!!

image text in transcribed

Email spam filtering models often use a bag-of-words representation for emails. In a bag-of-words representation, the descriptive features that describe a document (in our case, an email) each represent how many times a particular word occurs in the document. One descriptive feature is included for each word in a predefined dictionary. The dictionary is typically defined as the complete set of words that occur in the training dataset. The table below lists the bag-of-words representation for the following five emails and a target feature, SPAM, whether they are spam emails or genuine emails: "money, money, money" "free money for free gambling fun" "gambling for fun" "machine learning for fun, fun, fun" "free machine learning" What target level would a nearest neighbor model using Euclidean distance return for the following email: "machine learning for free"? What target level would a k-NN model with k = 3 and using Euclidean distance return for the same query? What target level would a weighted k-NN model with k = 5 and using a weighing scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query, return for the query? What target level would a k-NN model with k = 3 and using Manhattan distance return for the same query? There are a lot of zero entries in the spam bag-of-words dataset This is indicative of sparse data and is typical for text analytics. Cosine similarity is often a good choice when dealing with sparse non-binary data. What target level would a 3-NN model using cosine similarity return for the query? Email spam filtering models often use a bag-of-words representation for emails. In a bag-of-words representation, the descriptive features that describe a document (in our case, an email) each represent how many times a particular word occurs in the document. One descriptive feature is included for each word in a predefined dictionary. The dictionary is typically defined as the complete set of words that occur in the training dataset. The table below lists the bag-of-words representation for the following five emails and a target feature, SPAM, whether they are spam emails or genuine emails: "money, money, money" "free money for free gambling fun" "gambling for fun" "machine learning for fun, fun, fun" "free machine learning" What target level would a nearest neighbor model using Euclidean distance return for the following email: "machine learning for free"? What target level would a k-NN model with k = 3 and using Euclidean distance return for the same query? What target level would a weighted k-NN model with k = 5 and using a weighing scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query, return for the query? What target level would a k-NN model with k = 3 and using Manhattan distance return for the same query? There are a lot of zero entries in the spam bag-of-words dataset This is indicative of sparse data and is typical for text analytics. Cosine similarity is often a good choice when dealing with sparse non-binary data. What target level would a 3-NN model using cosine similarity return for the query

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle Database 19c DBA By Examples Installation And Administration

Authors: Ravinder Gupta

1st Edition

B09FC7TQJ6, 979-8469226970

More Books

Students also viewed these Databases questions

Question

1 What demand is and what affects it.

Answered: 1 week ago