Question
Implement a python function build_bigram_probs(unigram_counts, bigram_counts) that takes the frequencies of single words (unigram_counts) and of pairs of words (bigram_counts) and returns a new nested
Each key of the dictionary is a word in the vocabulary and each key maps to the bigram probability dictionary for that word. A single words bigram probability dictionary is like a reduced version of the unigram probability results- it keeps track of all the words that can come after the start word and the probability for each of them.
As an example, if were processing the sentence "It is very nice, it is super cool", our result dictionary would look like this:
{ it : { words : [is], probs : [1] }, is : { words : ["very", super], probs : [0.5, 0.5] }, very : { words : [nice], probs : [1] }, nice : { words : [it], probs : [1] }, super : { words : [cool], probs : [1] }}
Note that cool is not included as a key in the outer dictionary because it is never the first word in a bigram.
To create this dictionary, follow these steps:
Step 1:Make a new dictionary
Step 2:Iterate through each key (well call it prev_word) in bigram_counts. (Note that bigram_counts[prev_word] is a dictionary of all the words that occurred after the previous word in the book)
Step 3:Make two new lists, one for the words (keys) in bigram_counts[prev_word], and one for the probabilities of those words
Step 4:Iterate through all of the keys in bigram_counts[prev_word], appending the word to the word list and the words probability to the probability list
Note 1:You can determine the probability by dividing the count by unigram_counts[prev_word], which is the total number of times the previous word occurred. Note 2:this isnt 100% accurate, because we might have fewer total word occurrences in bigram_counts than unigram_counts if prev_word occurred at the end of a sentence. But this usually only happens to punctuation, so well say this is good enough for now.
Step 5:Make a temporary dictionary mapping the string words to the word list and the string probs to the list of probabilities
Step 6:Add to the new dictionary (from the outer level) the key prev_word; the value is the dictionary from part d. In other terms, each previous word maps to a dictionary containing words and probabilities.
Step 7:Return the new dictionary
def test_build_bigram_probs():
print("Testing build_bigram_probs()...", end="")
# since 'world' appears twice, once at the end of a sentence
assert(build_bigram_probs(\
{ "hello" : 2, "world" : 2, "again" : 1 },
{ "hello" : { "world" : 2 }, "world" : { "again" : 1 } }) == \
{ "hello" : { "words" : ["world"], "probs" : [1] },
"world" : { "words" : ["again"], "probs" : [0.5] } })
assert(build_bigram_probs(\
{ "hello" : 1, "and" : 1, "welcome" : 1, "to" : 2, "the" : 1, "program" : 1, "." : 2, "we're" : 1, "happy" : 1, "have" : 1, "you" : 1 },
{ "hello" : { "and" : 1 }, "and" : { "welcome" : 1 }, "welcome" : { "to" : 1 },
"to" : { "the" : 1, "have" : 1 }, "the" : { "program" : 1 }, "program" : { "." : 1 }, "we're" : { "happy" : 1 },
"happy" : { "to" : 1 }, "have" : { "you" : 1 }, "you" : { "." : 1 } }) == \
{ "hello" : { "words" : ["and"], "probs" : [1] },
"and" : { "words" : ["welcome"], "probs" : [1] },
"welcome" : { "words" : ["to"], "probs" : [1] },
"to" : { "words" : ["the", "have"], "probs" : [0.5, 0.5] },
"the" : { "words" : [ "program" ], "probs" : [1] },
"program" : { "words" : ["."], "probs" : [1] },
"we're" : { "words" : ["happy"], "probs" : [1] },
"happy" : { "words" : ["to"], "probs" : [1] },
"have" : { "words" : ["you"], "probs" : [1] },
"you" : { "words" : ["."], "probs" : [1] } })
assert(build_bigram_probs(\
{ "this" : 1, "is" : 1, "the" : 1, "song" : 1, "that" : 1, "never" : 1,
"ends" : 1, "yes" : 1, "it" : 4, "goes" : 1, "on" : 3, "and" : 2, "my" : 1,
"friends" : 1, "!" : 1, "some" : 1, "people" : 1, "started" : 1, "singing" : 2,
"," : 2, "not" : 1, "knowing" : 1, "what" : 1, "was" : 1, "now" : 1, "they" : 1,
"keep" : 1, "forever" : 1, "just" : 1, "because" : 1, "." : 3 },
{ "this" : { "is" : 1 }, "is" : { "the" : 1 }, "the" : { "song" : 1 },
"song" : { "that" : 1 }, "that" : { "never" : 1 }, "never" : { "ends" : 1 },
"yes" : { "it" : 1 }, "it" : { "goes" : 1, "," : 1, "was" : 1, "forever" : 1 },
"goes" : { "on" : 1 }, "on" : { "and" : 1, "my" : 1, "singing" : 1 },
"and" : { "on" : 1, "now" : 1 }, "my" : { "friends" : 1}, "friends" : { "!" : 1 },
"some" : { "people" : 1 }, "people" : { "started" : 1 }, "started" : { "singing" : 1 },
"singing" : { "it" : 2 }, "," : { "not" : 1 }, "not" : { "knowing" : 1 },
"knowing" : { "what" : 1 }, "what" : { "it" : 1 }, "was" : { "," : 1 },
"now" : { "they" : 1 }, "they" : { "keep" : 1 }, "keep" : { "on" : 1 },
"forever" : { "just" : 1 }, "just" : { "because" : 1 },
"because" : { "." : 1 }, "." : { "." : 2 } }) == \
{ "this" : { "words" : ["is"], "probs" : [1] },
"is" : { "words" : ["the"], "probs" : [1] },
"the" : { "words" : ["song"], "probs" : [1] },
"song" : { "words" : ["that"], "probs" : [1] },
"that" : { "words" : ["never"], "probs" : [1] },
"never" : { "words" : ["ends"], "probs" : [1] },
"yes" : { "words" : ["it"], "probs" : [1] },
"it" : { "words" : ["goes", ",", "was", "forever"], "probs" : [0.25, 0.25, 0.25, 0.25] },
"goes" : { "words" : ["on"], "probs" : [1] },
"on" : { "words" : ["and", "my", "singing"], "probs" : [1/3, 1/3, 1/3] },
"and" : { "words" : ["on", "now"], "probs" : [0.5, 0.5] },
"my" : { "words" : ["friends"], "probs" : [1] },
"friends" : { "words" : ["!"], "probs" : [1] },
"some" : { "words" : ["people"], "probs" : [1] },
"people" : { "words" : ["started"], "probs" : [1] },
"started" : { "words" : ["singing"], "probs" : [1] },
"singing" : { "words" : ["it"], "probs" : [1] },
"," : { "words" : ["not"], "probs" : [0.5] }, # because the total count of "," is 2, with one at the end
"not" : { "words" : ["knowing"], "probs" : [1] },
"knowing" : { "words" : ["what"], "probs" : [1] },
"what" : { "words" : ["it"], "probs" : [1] },
"was" : { "words" : [","], "probs" : [1] },
"now" : { "words" : ["they"], "probs" : [1] },
"they" : { "words" : ["keep"], "probs" : [1] },
"keep" : { "words" : ["on"], "probs" : [1] },
"forever" : { "words" : ["just"], "probs" : [1] },
"just" : { "words" : ["because"], "probs" : [1] },
"because" : { "words" : ["."], "probs" : [1] },
"." : { "words" : ["."], "probs" : [2/3] } }) # because the total count is 3
# One final test to make sure probabilities aren't always the same
assert(build_bigram_probs(\
{ "one" : 3 },
{ "one" : { "a" : 1, "b" : 2 } }) == \
{ "one" : { "words" : ["a", "b"], "probs" : [1/3, 2/3] } })
print("... done!")
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started