Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 16, 2024

Implement a python function build_bigram_probs(unigram_counts, bigram_counts) that takes the frequencies of single words (unigram_counts) and of pairs of words (bigram_counts) and returns a new nested

Implement a python functionbuild_bigram_probs(unigram_counts, bigram_counts)that takes the frequencies of single words (unigram_counts) and of pairs of words (bigram_counts) and returns a new nested dictionary.

Each key of the dictionary is a word in the vocabulary and each key maps to the bigram probability dictionary for that word. A single words bigram probability dictionary is like a reduced version of the unigram probability results- it keeps track of all the words that can come after the start word and the probability for each of them.

As an example, if were processing the sentence "It is very nice, it is super cool", our result dictionary would look like this:

{ it : { words : [is], probs : [1] }, is : { words : ["very", super], probs : [0.5, 0.5] }, very : { words : [nice], probs : [1] }, nice : { words : [it], probs : [1] }, super : { words : [cool], probs : [1] }}

Note that cool is not included as a key in the outer dictionary because it is never the first word in a bigram.

To create this dictionary, follow these steps:

Step 1:Make a new dictionary

Step 2:Iterate through each key (well call it prev_word) in bigram_counts. (Note that bigram_counts[prev_word] is a dictionary of all the words that occurred after the previous word in the book)

Step 3:Make two new lists, one for the words (keys) in bigram_counts[prev_word], and one for the probabilities of those words

Step 4:Iterate through all of the keys in bigram_counts[prev_word], appending the word to the word list and the words probability to the probability list

Note 1:You can determine the probability by dividing the count by unigram_counts[prev_word], which is the total number of times the previous word occurred. Note 2:this isnt 100% accurate, because we might have fewer total word occurrences in bigram_counts than unigram_counts if prev_word occurred at the end of a sentence. But this usually only happens to punctuation, so well say this is good enough for now.

Step 5:Make a temporary dictionary mapping the string words to the word list and the string probs to the list of probabilities

Step 6:Add to the new dictionary (from the outer level) the key prev_word; the value is the dictionary from part d. In other terms, each previous word maps to a dictionary containing words and probabilities.

Step 7:Return the new dictionary

def test_build_bigram_probs():

print("Testing build_bigram_probs()...", end="")

# since 'world' appears twice, once at the end of a sentence

assert(build_bigram_probs(\

{ "hello" : 2, "world" : 2, "again" : 1 },

{ "hello" : { "world" : 2 }, "world" : { "again" : 1 } }) == \

{ "hello" : { "words" : ["world"], "probs" : [1] },

"world" : { "words" : ["again"], "probs" : [0.5] } })

assert(build_bigram_probs(\

{ "hello" : 1, "and" : 1, "welcome" : 1, "to" : 2, "the" : 1, "program" : 1, "." : 2, "we're" : 1, "happy" : 1, "have" : 1, "you" : 1 },

{ "hello" : { "and" : 1 }, "and" : { "welcome" : 1 }, "welcome" : { "to" : 1 },

"to" : { "the" : 1, "have" : 1 }, "the" : { "program" : 1 }, "program" : { "." : 1 }, "we're" : { "happy" : 1 },

"happy" : { "to" : 1 }, "have" : { "you" : 1 }, "you" : { "." : 1 } }) == \

{ "hello" : { "words" : ["and"], "probs" : [1] },

"and" : { "words" : ["welcome"], "probs" : [1] },

"welcome" : { "words" : ["to"], "probs" : [1] },

"to" : { "words" : ["the", "have"], "probs" : [0.5, 0.5] },

"the" : { "words" : [ "program" ], "probs" : [1] },

"program" : { "words" : ["."], "probs" : [1] },

"we're" : { "words" : ["happy"], "probs" : [1] },

"happy" : { "words" : ["to"], "probs" : [1] },

"have" : { "words" : ["you"], "probs" : [1] },

"you" : { "words" : ["."], "probs" : [1] } })

assert(build_bigram_probs(\

{ "this" : 1, "is" : 1, "the" : 1, "song" : 1, "that" : 1, "never" : 1,

"ends" : 1, "yes" : 1, "it" : 4, "goes" : 1, "on" : 3, "and" : 2, "my" : 1,

"friends" : 1, "!" : 1, "some" : 1, "people" : 1, "started" : 1, "singing" : 2,

"," : 2, "not" : 1, "knowing" : 1, "what" : 1, "was" : 1, "now" : 1, "they" : 1,

"keep" : 1, "forever" : 1, "just" : 1, "because" : 1, "." : 3 },

{ "this" : { "is" : 1 }, "is" : { "the" : 1 }, "the" : { "song" : 1 },

"song" : { "that" : 1 }, "that" : { "never" : 1 }, "never" : { "ends" : 1 },

"yes" : { "it" : 1 }, "it" : { "goes" : 1, "," : 1, "was" : 1, "forever" : 1 },

"goes" : { "on" : 1 }, "on" : { "and" : 1, "my" : 1, "singing" : 1 },

"and" : { "on" : 1, "now" : 1 }, "my" : { "friends" : 1}, "friends" : { "!" : 1 },

"some" : { "people" : 1 }, "people" : { "started" : 1 }, "started" : { "singing" : 1 },

"singing" : { "it" : 2 }, "," : { "not" : 1 }, "not" : { "knowing" : 1 },

"knowing" : { "what" : 1 }, "what" : { "it" : 1 }, "was" : { "," : 1 },

"now" : { "they" : 1 }, "they" : { "keep" : 1 }, "keep" : { "on" : 1 },

"forever" : { "just" : 1 }, "just" : { "because" : 1 },

"because" : { "." : 1 }, "." : { "." : 2 } }) == \

{ "this" : { "words" : ["is"], "probs" : [1] },

"is" : { "words" : ["the"], "probs" : [1] },

"the" : { "words" : ["song"], "probs" : [1] },

"song" : { "words" : ["that"], "probs" : [1] },

"that" : { "words" : ["never"], "probs" : [1] },

"never" : { "words" : ["ends"], "probs" : [1] },

"yes" : { "words" : ["it"], "probs" : [1] },

"it" : { "words" : ["goes", ",", "was", "forever"], "probs" : [0.25, 0.25, 0.25, 0.25] },

"goes" : { "words" : ["on"], "probs" : [1] },

"on" : { "words" : ["and", "my", "singing"], "probs" : [1/3, 1/3, 1/3] },

"and" : { "words" : ["on", "now"], "probs" : [0.5, 0.5] },

"my" : { "words" : ["friends"], "probs" : [1] },

"friends" : { "words" : ["!"], "probs" : [1] },

"some" : { "words" : ["people"], "probs" : [1] },

"people" : { "words" : ["started"], "probs" : [1] },

"started" : { "words" : ["singing"], "probs" : [1] },

"singing" : { "words" : ["it"], "probs" : [1] },

"," : { "words" : ["not"], "probs" : [0.5] }, # because the total count of "," is 2, with one at the end

"not" : { "words" : ["knowing"], "probs" : [1] },

"knowing" : { "words" : ["what"], "probs" : [1] },

"what" : { "words" : ["it"], "probs" : [1] },

"was" : { "words" : [","], "probs" : [1] },

"now" : { "words" : ["they"], "probs" : [1] },

"they" : { "words" : ["keep"], "probs" : [1] },

"keep" : { "words" : ["on"], "probs" : [1] },

"forever" : { "words" : ["just"], "probs" : [1] },

"just" : { "words" : ["because"], "probs" : [1] },

"because" : { "words" : ["."], "probs" : [1] },

"." : { "words" : ["."], "probs" : [2/3] } }) # because the total count is 3

# One final test to make sure probabilities aren't always the same

assert(build_bigram_probs(\

{ "one" : 3 },

{ "one" : { "a" : 1, "b" : 2 } }) == \

{ "one" : { "words" : ["a", "b"], "probs" : [1/3, 2/3] } })

print("... done!")

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

App Inventor

Authors: David Wolber, Hal Abelson

22. The lifetime of a car has a distribution H and probability density h. Ms. Jones buys a new car as soon as her old car either breaks down or reaches the age of T years. A new car costs C1 dollars...

Answered: 1 week ago

Previous Question Next Question