Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Q 2 . 1 : Train N - gram language model ( 2 0 pts ) Complete the following train _ ngram _ lm function

Q2.1: Train N-gram language model (20 pts)
Complete the following train_ngram_lm function based on the following input/output specifications. If you've done it right, you should pass the tests in the cell below.
Input:
data: the data object created in the cell above that holds the tokenized Wikitext data
order: the order of the model (i.e., the "n" in "n-gram" model). If order=3, we compute
.
Output:
lm: A dictionary where the key is the history and the value is a probability distribution over the next character computed using the maximum likelihood estimate from the training data. Importantly, this dictionary should include backoff probabilities as well; e.g., for order=4, we want to store
as well as
and
.
Each key should be a single string where the characters that form the history have been concatenated. Given a key, its corresponding value should be a dictionary where each character in the vocabulary is associated with its probability of appearing after the key. For example, the entry for the history 'c1c2' should look like:
lm['c1c2']={'c0': 0.001,'c1' : 1e-6,'c2' : 1e-6,'c3': 0.003,...}
In this example, we also want to store lm['c2'] and lm[''], which contain the bigram and unigram distributions respectively.
Hint: You might find the defaultdict and Counter classes in the collections module to be helpful.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems For Advanced Applications Dasfaa 2023 International Workshops Bdms 2023 Bdqm 2023 Gdma 2023 Bundlers 2023 Tianjin China April 17 20 2023 Proceedings Lncs 13922

Authors: Amr El Abbadi ,Gillian Dobbie ,Zhiyong Feng ,Lu Chen ,Xiaohui Tao ,Yingxia Shao ,Hongzhi Yin

1st Edition

3031354141, 978-3031354144

More Books

Students also viewed these Databases questions

Question

What are the two major statutes regulating the securities industry?

Answered: 1 week ago

Question

=+2. How reliable is this existing information?

Answered: 1 week ago