Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

ANSWER FROM PREVIOUS QUESTION: import nltk from nltk.corpus import brown dct = dict() for word in brown.words(): temp = dct.get(word,0) dct[word]=temp+1 a = list(dct.items()) a.sort(reverse=True,key=lambda

image text in transcribed

ANSWER FROM PREVIOUS QUESTION:  import nltk from nltk.corpus import brown dct = dict() for word in brown.words(): temp = dct.get(word,0) dct[word]=temp+1 a = list(dct.items()) a.sort(reverse=True,key=lambda x:x[1]) total = len(brown.words()) prop = 0 for i in range(20): prop+=a[i][1] print(f"{(prop/total):.2f}")

Questions Create a new frequency distribution of the Brown bigrams. Plot the cumulative frequency distribution of the top 50 bigrams. Then do add one smoothing on the bigrams. This will require adding one to all the bigram counts, including those that previously had count 0. You will also need to change the ungram counts appropriately. You will compute all possible bigrams using the known vocabulary, so use the keys of the unigram Brown distribution you created before to compute the set of possible bigrams. The vocabulary size from that exercise should be 49815. Then having added 1 to all the bigram counts, you must compute at least the following Probabilities: 1. P(the | in) before and after smoothing (P_{\text{mle}} and P_{\text{laplace}}); 2. P(in the) before and after smoothing; 3. P(said the) before and after smoothing. 4. P(the said) before and after smoothing. In some cases you will to use the unigram counts to compute these probabilities. Remember that the unigram counts must change too when smoothing. Turn in these values and the Python code you used to compute them. Questions Create a new frequency distribution of the Brown bigrams. Plot the cumulative frequency distribution of the top 50 bigrams. Then do add one smoothing on the bigrams. This will require adding one to all the bigram counts, including those that previously had count 0. You will also need to change the ungram counts appropriately. You will compute all possible bigrams using the known vocabulary, so use the keys of the unigram Brown distribution you created before to compute the set of possible bigrams. The vocabulary size from that exercise should be 49815. Then having added 1 to all the bigram counts, you must compute at least the following Probabilities: 1. P(the | in) before and after smoothing (P_{\text{mle}} and P_{\text{laplace}}); 2. P(in the) before and after smoothing; 3. P(said the) before and after smoothing. 4. P(the said) before and after smoothing. In some cases you will to use the unigram counts to compute these probabilities. Remember that the unigram counts must change too when smoothing. Turn in these values and the Python code you used to compute them

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Spatial Database Systems Design Implementation And Project Management

Spatial Database Systems Design Implementation And Project Management

Authors: Albert K.W. Yeung, G. Brent Hall

1st Edition

1402053932, 978-1402053931

More Books

Students also viewed these Databases questions

Question

★★★★★

A group of 30 students attend a class in a room that measures 10 m by 8 m by 3 m. Each student takes up about 0.075 m 3 and gives out about 80 W of heat (l W = l J/s). Calculate the air temperature...

Answered: 1 week ago

Question

★★★★★

Rick Kleckner NV recorded a right-of-use asset for 300,000 as a result of a lease on December 31, 2018. Kleckner's incremental borrowing rate is 8%, and the implicit rate of the lessor was not known...

Answered: 1 week ago

Question

★★★★★

1. What do you want to accomplish in this meeting?

Answered: 1 week ago

Question

★★★★★

Clampett Oil purchases crude oil products from suppliers in Texas (TX), Oklahoma (OK), Pennsylvania (PA), and Alabama (AL), from which it refines four end-products: gasoline, kerosene, heating oil,...

Answered: 1 week ago

Question

★★★★★

ANSWER FROM PREVIOUS QUESTION: import nltk from nltk.corpus import brown dct = dict() for word in brown.words(): temp = dct.get(word,0) dct[word]=temp+1 a = list(dct.items())...

Answered: 1 week ago

Question

★★★★★

E6 All numbers in thousands Lululemon Athletica Inc. (LULU) Income Statement Period Ending Total Revenue Cost of Revenue 3-Feb-13 1870.358 S 607 532 29-Jan-12 1,000,839 431488 S 30 Jan 11 711,704...

Answered: 1 week ago

Question

★★★★★

Is the data normally distributed? Justify your answer! Data Set 1 1 2 7 13 7 2 1

Answered: 1 week ago

Question

★★★★★

1) Explain any five (5) ways in which you would apply information technology to improve service delivery. Use examples to support your explanations. (5 marks) 2) With the help of examples, describe...

Answered: 1 week ago

Question

★★★★★

George Ropel was a pedestrian who was seriously injured when he was struck by a motor vehicle driven by the defendant Robert Cross. The drivers insurance company was Intact Insurance company. George...

Answered: 1 week ago

Question

★★★★★

The P-value is=? The following table lists results from an experiment designed to test the ability of dogs to use their extraordinary sense of smell to detect malaria in samples of children's socks...

Answered: 1 week ago

Question

★★★★★

Readings: Happiness: The Science of Subjective Well-Being (Diener, 2020) https://nobaproject.com/modules/happiness-the-science-of-subjective-well-being Heintzelman, S. J., & Tay, L. (2018)....

Answered: 1 week ago

Question

★★★★★

13. What are Lifelong Learning Accounts? Do you think they help retain employees or encourage them to train and then leave the company? Explain your rationale.

Answered: 1 week ago

Question

★★★★★

3. You are in charge of preparing a team of three managers from the United States to go to Ciudad Juarez, Mexico, where you have recently acquired an auto assembly plant. The managers will be in...

Answered: 1 week ago

Question

★★★★★

4. Go to fairuse.stanford.edu, a Web site called Copyright and Fair Use created by Stanford University Libraries. How does work fall into the public domain? That is, how can you use someone elses...

Answered: 1 week ago

Previous Question Next Question