Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Could you please explain a llittle bit while providing the answers. Many thanks.. 1. Email spam filtering models often use a bag-of-words representation for emails.

image text in transcribed

Could you please explain a llittle bit while providing the answers. Many thanks..

1. Email spam filtering models often use a bag-of-words representation for emails. In a bag- of-words representation, the descriptive features that describe a document (in our case, an email) each represent how many times a particular word occurs in the document. One descriptive feature is included for each word in a predefined dictionary. The dictionary is typically defined as the complete set of words that occur in the training dataset. The table below lists the bag-of-words representation for the following five emails and a target feature, SPAM, whether they are spam emails or genuine emails: "money, money, money" "free money for free gambling fun" "gambling for fun" "machine learning for fun, fun, fun" "free machine learning" ID 1 2 3 4 MONEY 3 1 0 0 0 FREE 0 2 0 0 1 FOR 0 1 1 1 0 Bag-of-Words GAMBLING FUN 0 0 1 1 1 1 0 3 0 0 MACHINE LEARNING SPAM 0 0 true 0 0 true 0 0 true 1 1 false 1 1 false a What target level would a nearest neighbor model using Euclidean distance return for the following email: "machine learning for free"? b. What target level would a k-NN model with k = 3 and using Euclidean distance return for the same query? c. What target level would a weighted k-NN model with k=5 and using a weighting scheme of the reciprocal of the squared Euclidean distance between the neighbor and the query, return for the query? d. What target level would a k-NN model with k = 3 and using Manhattan distance return for the same query? e. There are a lot of zero entries in the spam bag-of-words dataset. This is indicative of sparse data and is typical for text analytics. Cosine similarity is often a good choice when dealing with sparse non-binary data. What target level would a 3-NN model using cosine similarity return for the query

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Information Modeling And Relational Databases

Authors: Terry Halpin, Tony Morgan

2nd Edition

0123735688, 978-0123735683

More Books

Students also viewed these Databases questions

Question

Understand why customers complain.

Answered: 1 week ago