Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Implement a python function build_bigram_probs(unigram_counts, bigram_counts) that takes the frequencies of single words (unigram_counts) and of pairs of words (bigram_counts) and returns a new nested

Implement a python functionbuild_bigram_probs(unigram_counts, bigram_counts)that takes the frequencies of single words (unigram_counts) and of pairs of words (bigram_counts) and returns a new nested dictionary.

Each key of the dictionary is a word in the vocabulary and each key maps to the bigram probability dictionary for that word. A single words bigram probability dictionary is like a reduced version of the unigram probability results- it keeps track of all the words that can come after the start word and the probability for each of them.

As an example, if were processing the sentence "It is very nice, it is super cool", our result dictionary would look like this:

{ it : { words : [is], probs : [1] }, is : { words : ["very", super], probs : [0.5, 0.5] }, very : { words : [nice], probs : [1] }, nice : { words : [it], probs : [1] }, super : { words : [cool], probs : [1] }}

Note that cool is not included as a key in the outer dictionary because it is never the first word in a bigram.

To create this dictionary, follow these steps:

Step 1:Make a new dictionary

Step 2:Iterate through each key (well call it prev_word) in bigram_counts. (Note that bigram_counts[prev_word] is a dictionary of all the words that occurred after the previous word in the book)

Step 3:Make two new lists, one for the words (keys) in bigram_counts[prev_word], and one for the probabilities of those words

Step 4:Iterate through all of the keys in bigram_counts[prev_word], appending the word to the word list and the words probability to the probability list

Note 1:You can determine the probability by dividing the count by unigram_counts[prev_word], which is the total number of times the previous word occurred. Note 2:this isnt 100% accurate, because we might have fewer total word occurrences in bigram_counts than unigram_counts if prev_word occurred at the end of a sentence. But this usually only happens to punctuation, so well say this is good enough for now.

Step 5:Make a temporary dictionary mapping the string words to the word list and the string probs to the list of probabilities

Step 6:Add to the new dictionary (from the outer level) the key prev_word; the value is the dictionary from part d. In other terms, each previous word maps to a dictionary containing words and probabilities.

Step 7:Return the new dictionary

def test_build_bigram_probs():

print("Testing build_bigram_probs()...", end="")

# since 'world' appears twice, once at the end of a sentence

assert(build_bigram_probs(\

{ "hello" : 2, "world" : 2, "again" : 1 },

{ "hello" : { "world" : 2 }, "world" : { "again" : 1 } }) == \

{ "hello" : { "words" : ["world"], "probs" : [1] },

"world" : { "words" : ["again"], "probs" : [0.5] } })

assert(build_bigram_probs(\

{ "hello" : 1, "and" : 1, "welcome" : 1, "to" : 2, "the" : 1, "program" : 1, "." : 2, "we're" : 1, "happy" : 1, "have" : 1, "you" : 1 },

{ "hello" : { "and" : 1 }, "and" : { "welcome" : 1 }, "welcome" : { "to" : 1 },

"to" : { "the" : 1, "have" : 1 }, "the" : { "program" : 1 }, "program" : { "." : 1 }, "we're" : { "happy" : 1 },

"happy" : { "to" : 1 }, "have" : { "you" : 1 }, "you" : { "." : 1 } }) == \

{ "hello" : { "words" : ["and"], "probs" : [1] },

"and" : { "words" : ["welcome"], "probs" : [1] },

"welcome" : { "words" : ["to"], "probs" : [1] },

"to" : { "words" : ["the", "have"], "probs" : [0.5, 0.5] },

"the" : { "words" : [ "program" ], "probs" : [1] },

"program" : { "words" : ["."], "probs" : [1] },

"we're" : { "words" : ["happy"], "probs" : [1] },

"happy" : { "words" : ["to"], "probs" : [1] },

"have" : { "words" : ["you"], "probs" : [1] },

"you" : { "words" : ["."], "probs" : [1] } })

assert(build_bigram_probs(\

{ "this" : 1, "is" : 1, "the" : 1, "song" : 1, "that" : 1, "never" : 1,

"ends" : 1, "yes" : 1, "it" : 4, "goes" : 1, "on" : 3, "and" : 2, "my" : 1,

"friends" : 1, "!" : 1, "some" : 1, "people" : 1, "started" : 1, "singing" : 2,

"," : 2, "not" : 1, "knowing" : 1, "what" : 1, "was" : 1, "now" : 1, "they" : 1,

"keep" : 1, "forever" : 1, "just" : 1, "because" : 1, "." : 3 },

{ "this" : { "is" : 1 }, "is" : { "the" : 1 }, "the" : { "song" : 1 },

"song" : { "that" : 1 }, "that" : { "never" : 1 }, "never" : { "ends" : 1 },

"yes" : { "it" : 1 }, "it" : { "goes" : 1, "," : 1, "was" : 1, "forever" : 1 },

"goes" : { "on" : 1 }, "on" : { "and" : 1, "my" : 1, "singing" : 1 },

"and" : { "on" : 1, "now" : 1 }, "my" : { "friends" : 1}, "friends" : { "!" : 1 },

"some" : { "people" : 1 }, "people" : { "started" : 1 }, "started" : { "singing" : 1 },

"singing" : { "it" : 2 }, "," : { "not" : 1 }, "not" : { "knowing" : 1 },

"knowing" : { "what" : 1 }, "what" : { "it" : 1 }, "was" : { "," : 1 },

"now" : { "they" : 1 }, "they" : { "keep" : 1 }, "keep" : { "on" : 1 },

"forever" : { "just" : 1 }, "just" : { "because" : 1 },

"because" : { "." : 1 }, "." : { "." : 2 } }) == \

{ "this" : { "words" : ["is"], "probs" : [1] },

"is" : { "words" : ["the"], "probs" : [1] },

"the" : { "words" : ["song"], "probs" : [1] },

"song" : { "words" : ["that"], "probs" : [1] },

"that" : { "words" : ["never"], "probs" : [1] },

"never" : { "words" : ["ends"], "probs" : [1] },

"yes" : { "words" : ["it"], "probs" : [1] },

"it" : { "words" : ["goes", ",", "was", "forever"], "probs" : [0.25, 0.25, 0.25, 0.25] },

"goes" : { "words" : ["on"], "probs" : [1] },

"on" : { "words" : ["and", "my", "singing"], "probs" : [1/3, 1/3, 1/3] },

"and" : { "words" : ["on", "now"], "probs" : [0.5, 0.5] },

"my" : { "words" : ["friends"], "probs" : [1] },

"friends" : { "words" : ["!"], "probs" : [1] },

"some" : { "words" : ["people"], "probs" : [1] },

"people" : { "words" : ["started"], "probs" : [1] },

"started" : { "words" : ["singing"], "probs" : [1] },

"singing" : { "words" : ["it"], "probs" : [1] },

"," : { "words" : ["not"], "probs" : [0.5] }, # because the total count of "," is 2, with one at the end

"not" : { "words" : ["knowing"], "probs" : [1] },

"knowing" : { "words" : ["what"], "probs" : [1] },

"what" : { "words" : ["it"], "probs" : [1] },

"was" : { "words" : [","], "probs" : [1] },

"now" : { "words" : ["they"], "probs" : [1] },

"they" : { "words" : ["keep"], "probs" : [1] },

"keep" : { "words" : ["on"], "probs" : [1] },

"forever" : { "words" : ["just"], "probs" : [1] },

"just" : { "words" : ["because"], "probs" : [1] },

"because" : { "words" : ["."], "probs" : [1] },

"." : { "words" : ["."], "probs" : [2/3] } }) # because the total count is 3

# One final test to make sure probabilities aren't always the same

assert(build_bigram_probs(\

{ "one" : 3 },

{ "one" : { "a" : 1, "b" : 2 } }) == \

{ "one" : { "words" : ["a", "b"], "probs" : [1/3, 2/3] } })

print("... done!")

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

App Inventor

Authors: David Wolber, Hal Abelson

1st Edition

1449397484, 9781449397487

More Books

Students also viewed these Programming questions

Question

What problems are associated with tracking historical costs?

Answered: 1 week ago