In this question you will implement a Naive Bayes classifier for a text classification problem. You...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this question you will implement a Naive Bayes classifier for a text classification problem. You will be given a collection of text articles, each coming from either the serious European magazine The Economist, or from the not-so-serious American magazine The Onion. The goal is to learn a classifier that can distinguish between articles from each magazine. We have pre-processed the articles so that they are easier to use in your experiments. We extracted the set of all words that occur in any of the articles. This set is called the vocabulary and we let V be the number of words in the vocabulary. For each article, we produced a feature vector X = (X₁,..., Xv), where if the ith word appears in the article otherwise X₁ = 0, and each article is also accompanied by a class label yi, where 1, for The Economist 2, for The Onion Yi = When we apply the Naive Bayes classification algorithm, we make two assumptions about the data: first, we assume that our data is drawn iid from a joint probability distribution over the possible feature vectors X and the corresponding class labels Y; second, we assume for each pair of features X, and X, with i ‡ j that X, is conditionally independent of X, given the class label Y (this is the Naive Bayes assumption). Under these assumptions, a natural classification rule is as follows: Given a new input X, predict the most probable class label Ŷ given X. Formally, Using Bayes Rule and the Naive Bayes assumption, we can rewrite this classification rule as follows: P(X|Y = y)P(Y = y) P(X) (Bayes Rule) (Denominator does not depend on y) Ŷ = argmax Y = argmax P(X|Y = y)P(Y = y) Y Y = argmax P(Y = y|X). Y = argmax P(X₁,..., Xv|Y = y)P(Y = y) Y = argmax Y V IP(Xw|Y: = y)) P(Y (Conditional independence). w=1 Of course, since we don't know the true joint distribution over feature vectors X and class labels Y, we need to estimate the probabilities P(X|Y = y) and P(Y = y) from the training data. For each word index w € {1, ..., V} and class label y ≤ {1,2}, the distribution of X given Y = y is a Bernoulli distribution with parameter Øyw. In other words, there is some unknown number Oyw such that P(Y = y) P(X₂ = 1|Y = y) = Oyw P(Xw = 0|Y = y) = 1 - 0yw. We believe that there is a non-zero (but maybe very small) probability that any word in the vocabulary can appear in an article from either The Onion or The Economist. To make sure that our estimated probabilities are always non-zero, we will impose a Beta (2,1) prior on yw and compute the MAP estimate from the training data. Similarly, the distribution of Y (when we consider it alone) is a Bernoulli distribution with parameter p. In other words, there is some unknown number p such that P(Y = 1) = p P(Y=2) = 1 p. In this case, since we have many examples of articles from both The Economist and The Onion, there is no risk of having zero-probability estimates, so we will instead use the MLE for the distribution of Y. Questions 1. What would be the MAP estimate of Oyw = P(Xw = 1|Y = y) with a Beta(2,1) prior distribution? 2. What would be the MLE estimate for the prior, p = P(Y = 1)? [14 points] [8 points] [4 points] 3. Given Oyu, what would be the value of P(XW|Y = y)? 4. How would you classify a test example? Write the classification rule equation for a new test sample Xtest using yw and p? [8 points] 5. How do you think the train and test error would compare? Explain any significant differences. Hint: You can try to implement Naive Bayes for this question using python libraries and by finding a similar dataset. [4 points] 6. If we have less training data for the same problem, does the prior have more or less impact on our classifier? Explain any possible difference between the train and test error in this question and when we used the whole data. [6 points] In this question you will implement a Naive Bayes classifier for a text classification problem. You will be given a collection of text articles, each coming from either the serious European magazine The Economist, or from the not-so-serious American magazine The Onion. The goal is to learn a classifier that can distinguish between articles from each magazine. We have pre-processed the articles so that they are easier to use in your experiments. We extracted the set of all words that occur in any of the articles. This set is called the vocabulary and we let V be the number of words in the vocabulary. For each article, we produced a feature vector X = (X₁,..., Xv), where if the ith word appears in the article otherwise X₁ = 0, and each article is also accompanied by a class label yi, where 1, for The Economist 2, for The Onion Yi = When we apply the Naive Bayes classification algorithm, we make two assumptions about the data: first, we assume that our data is drawn iid from a joint probability distribution over the possible feature vectors X and the corresponding class labels Y; second, we assume for each pair of features X, and X, with i ‡ j that X, is conditionally independent of X, given the class label Y (this is the Naive Bayes assumption). Under these assumptions, a natural classification rule is as follows: Given a new input X, predict the most probable class label Ŷ given X. Formally, Using Bayes Rule and the Naive Bayes assumption, we can rewrite this classification rule as follows: P(X|Y = y)P(Y = y) P(X) (Bayes Rule) (Denominator does not depend on y) Ŷ = argmax Y = argmax P(X|Y = y)P(Y = y) Y Y = argmax P(Y = y|X). Y = argmax P(X₁,..., Xv|Y = y)P(Y = y) Y = argmax Y V IP(Xw|Y: = y)) P(Y (Conditional independence). w=1 Of course, since we don't know the true joint distribution over feature vectors X and class labels Y, we need to estimate the probabilities P(X|Y = y) and P(Y = y) from the training data. For each word index w € {1, ..., V} and class label y ≤ {1,2}, the distribution of X given Y = y is a Bernoulli distribution with parameter Øyw. In other words, there is some unknown number Oyw such that P(Y = y) P(X₂ = 1|Y = y) = Oyw P(Xw = 0|Y = y) = 1 - 0yw. We believe that there is a non-zero (but maybe very small) probability that any word in the vocabulary can appear in an article from either The Onion or The Economist. To make sure that our estimated probabilities are always non-zero, we will impose a Beta (2,1) prior on yw and compute the MAP estimate from the training data. Similarly, the distribution of Y (when we consider it alone) is a Bernoulli distribution with parameter p. In other words, there is some unknown number p such that P(Y = 1) = p P(Y=2) = 1 p. In this case, since we have many examples of articles from both The Economist and The Onion, there is no risk of having zero-probability estimates, so we will instead use the MLE for the distribution of Y. Questions 1. What would be the MAP estimate of Oyw = P(Xw = 1|Y = y) with a Beta(2,1) prior distribution? 2. What would be the MLE estimate for the prior, p = P(Y = 1)? [14 points] [8 points] [4 points] 3. Given Oyu, what would be the value of P(XW|Y = y)? 4. How would you classify a test example? Write the classification rule equation for a new test sample Xtest using yw and p? [8 points] 5. How do you think the train and test error would compare? Explain any significant differences. Hint: You can try to implement Naive Bayes for this question using python libraries and by finding a similar dataset. [4 points] 6. If we have less training data for the same problem, does the prior have more or less impact on our classifier? Explain any possible difference between the train and test error in this question and when we used the whole data. [6 points]
Expert Answer:
Related Book For
Posted Date:
Students also viewed these programming questions
-
Pastina Company sells various types of pasta to grocery chains as private label brands. The company\'s reporting year - end is December 3 1 . The unadjusted trial balance as of December 3 1 , 2 0 2 1...
-
Assignment 5: Hash Table implementation andconcordance There are three parts to this assignment. In the first two parts,you will complete the implementation of a hash map and aconcordance program. In...
-
Plot the complex number and find its absolute value. 19-i Plot the complex number on the complex plane to the right. The absolute value of the complex number is |19-= (Simplify your answer. Type an...
-
Make a presentation about marketing: Your Companys marketing department promotes the products and interacts with the customers, sales force, and supply chain. They are also in charge of forecasting...
-
Suppose that you own 1,000 shares of Nocash Corp. and the company is about to pay a 25% stock dividend. The stock currently sells at $100 per share. a. What will be the number of shares that you hold...
-
Barr Company acquires 60, 10%, 5 years, $1,000 Community bonds on January 1, 2010 for $61,250. This includes a brokerage commission of $1,250.The journal entry to record this investment includes a...
-
If the conditional statement is true, and the hypothesis is true, what is a valid conclusion to the argument? Use the conditional statement, \(p ightarrow q\) : "If Phil Mickelson is 50 years old,...
-
Schellhammer Corporation reported the following amounts in 2013, 2014, and 2015. Instructions (a) Identify and describe the three tools of financial statement analysis. (b) Perform each of the three...
-
Manipulating CAPM???Use the basic equation for the capital asset pricing model ?(CAPM?) to work each of the following problems.a.??Find the required return for an asset with a beta of 0 answers
-
How does a volcanic neck form? What happens to the rest of the volcano that originally surrounded it?
-
In 1 9 5 2 Blayney Scott and his wife Almeda started a small company in Victoria, British Columbia, Canada that pioneered the use of plastics in the manufacturing of marine products. Scotty...
-
Land Co wanted to purchase a plot of land. They agreed to purchase a plot of land with a building on it with the intention of only keeping the land. The assessment value of the land by an independent...
-
I need help with following parts relating to this question: 1. The manufacturer has collected monthly data on past market prices of widgets. Suppose that all annual price change can be assumed to be...
-
Share whether you think an individual has to be a strong leader in order to be an effective manager. Explain why or why not, with reference to the articles above to support your position. Further...
-
You are a Singapore-based foreign exchange trader, and you observe a put option for (Australian dollars) A$125,000 with an exercise price of (Singapore dollars) S$0.57 and a premium of S$0.02. The...
-
1 . On 4 January 2 0 X 7 , a new piece of equipment was purchased for 4 8 , 5 0 0 . The equipment is made up of various components; each item is considered significant. The components include:...
-
Construct a 4 x 25 design confounded in two blocks of 16 observations each. Outline the analysis of variance for this design.
-
Consider the claim that the OCA criteria are self-fulfilling. In what ways might this benefit a country that joins the currency union even if it doesnt satisfy the OCA criteria before joining? In...
-
Table 3-1(14-1) in the text shows the percentage undervaluation or overvaluation in the Big Mac, based on exchange rates in July 2019. Go to the main data repository for the BigMac dataset at...
-
What provision of U.S. trade law was used by President Trump to apply a tariff on steel and aluminum? What provision of U.S. trade was used to justify the tariffs on goods imported from China? Do...
-
10. CPA QUESTION In general, which of the following statements is correct conceming the priority among checks drawn on a particular account and presented to the drawee bank on a particular day? a....
-
11. Rev. Janet Hooper Ritchie knew that the shoe store at Buckland Hills mall in Manchester, Connecticut, would not accept a Discover credit card, so she stopped at an ATM for a $100 cash advance....
-
Go to http://www.legaldocs.com (or an equivalent site) and fill in the blanks of a promissory note. Who is the maker, and who is the payee of your note? Did you create a demand note?
Study smarter with the SolutionInn App