Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

2 Multiclass Naive Bayes with Bag of Words A group of artists wish to use Naive Bayes algorithm to classify a given artwork into three

image text in transcribed

2 Multiclass Naive Bayes with Bag of Words A group of artists wish to use Naive Bayes algorithm to classify a given artwork into three different categories given a text description of the painting. These descriptions are short and simple and have already been passed through a feature function which returned the following key features based on the count of certain words used to describe the painting. The three categories are related to the overall color scheme and are as follows: Warm, Cool and Neutral. A set of these descriptions have been sampled and each were classified by the artists based on their feature vectors. The data collected so far is given in the table below: a. (1 pt) What is the probability y of each label y{ Warm, Neutral, Cool } ? b. (3 pts) The parameter y,j is the probability of a token j appearing with label y. It is defined by the following equation, where V is the size of the vocabulary set: y,j=j=1Vcount(y,j)count(y,j) The probability of a count of words x and a label y is defined as follows. Here, count (y,j) represents the frequency of word j appearing with label y over all data points. p(x,y;,)=p(y;)p(xy;)=p(y;)j=1Vy,jxj Here, the words are the names of colors that appear in the text description of the artwork, and a word count vector indicates the occurrence of each of the words in the text description for a given artwork. Find the most likely label y^ for the following word counts vector x=(0,1,0,1,1,0,0,1) using y^=argmaxylogp(x,y;;). Show final log (base-10) probabilities for each label rounded to 3 decimals. Treat log(0) as . (Hint: read more about binary multinomial naive Bayes in Jurafsky Martin Chapter 4, as well as Hiroshi Shimodaira's note - https://www . inf . ed.ac. uk/teaching/ courses/inf2b/learnnotes/inf2b-learn-note07-2up.pdf.) c. (3 pts) When calculating argmax xy, if y,j=0 for a label-word pair, the label y is no longer considered. This is an issue, especially for smaller datasets where a feature may not be present in all documents for a certain label. One approach to mitigating this high variance is to smooth the probabilities. Using add-1 smoothing, which redefines y,j, again find the most likely label y^ for the following word counts vector x=(0,1,0,1,1,0,0,1) using y^=argmaxylogp(x,y;;). Make sure to show final log probabilities. add-1smoothing:y,j=V+j=1Vcount(y,j)1+count(y,j)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Formal SQL Tuning For Oracle Databases Practical Efficiency Efficient Practice

Authors: Leonid Nossov ,Hanno Ernst ,Victor Chupis

1st Edition

3662570564, 978-3662570562

More Books

Students also viewed these Databases questions