Answered step by step
Verified Expert Solution
Question
1 Approved Answer
2 Multiclass Naive Bayes with Bag of Words A group of artists wish to use Naive Bayes algorithm to classify a given artwork into three
2 Multiclass Naive Bayes with Bag of Words A group of artists wish to use Naive Bayes algorithm to classify a given artwork into three different categories given a text description of the painting. These descriptions are short and simple and have already been passed through a feature function which returned the following key features based on the count of certain words used to describe the painting. The three categories are related to the overall color scheme and are as follows: Warm, Cool and Neutral. A set of these descriptions have been sampled and each were classified by the artists based on their feature vectors. The data collected so far is given in the table below: a. (1 pt) What is the probability y of each label y{ Warm, Neutral, Cool } ? b. (3 pts) The parameter y,j is the probability of a token j appearing with label y. It is defined by the following equation, where V is the size of the vocabulary set: y,j=j=1Vcount(y,j)count(y,j) The probability of a count of words x and a label y is defined as follows. Here, count (y,j) represents the frequency of word j appearing with label y over all data points. p(x,y;,)=p(y;)p(xy;)=p(y;)j=1Vy,jxj Here, the words are the names of colors that appear in the text description of the artwork, and a word count vector indicates the occurrence of each of the words in the text description for a given artwork. Find the most likely label y^ for the following word counts vector x=(0,1,0,1,1,0,0,1) using y^=argmaxylogp(x,y;;). Show final log (base-10) probabilities for each label rounded to 3 decimals. Treat log(0) as . (Hint: read more about binary multinomial naive Bayes in Jurafsky Martin Chapter 4, as well as Hiroshi Shimodaira's note - https://www . inf . ed.ac. uk/teaching/ courses/inf2b/learnnotes/inf2b-learn-note07-2up.pdf.) c. (3 pts) When calculating argmax xy, if y,j=0 for a label-word pair, the label y is no longer considered. This is an issue, especially for smaller datasets where a feature may not be present in all documents for a certain label. One approach to mitigating this high variance is to smooth the probabilities. Using add-1 smoothing, which redefines y,j, again find the most likely label y^ for the following word counts vector x=(0,1,0,1,1,0,0,1) using y^=argmaxylogp(x,y;;). Make sure to show final log probabilities. add-1smoothing:y,j=V+j=1Vcount(y,j)1+count(y,j)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started