Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Q 2 . 1 : Train N - gram language model ( 2 0 pts ) Complete the following train _ ngram _ lm function
Q: Train Ngram language model pts
Complete the following trainngramlm function based on the following inputoutput specifications. If you've done it right, you should pass the tests in the cell below.
Input:
data: the data object created in the cell above that holds the tokenized Wikitext data
order: the order of the model ie the n in ngram" model If order we compute
Output:
lm: A dictionary where the key is the history and the value is a probability distribution over the next character computed using the maximum likelihood estimate from the training data. Importantly, this dictionary should include backoff probabilities as well; eg for order we want to store
as well as
and
Each key should be a single string where the characters that form the history have been concatenated. Given a key, its corresponding value should be a dictionary where each character in the vocabulary is associated with its probability of appearing after the key. For example, the entry for the history cc should look like:
lmccc: c : ec : ec:
In this example, we also want to store lmc and lm which contain the bigram and unigram distributions respectively.
Hint: You might find the defaultdict and Counter classes in the collections module to be helpful.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started