Answered step by step
Verified Expert Solution
Question
1 Approved Answer
(a) Expand the above definition of p(w) using naive estimates of the parameters, such as P(WA | W,, W, ) der C(W, W; WA )
(a) Expand the above definition of p(w) using naive estimates of the parameters, such as P(WA | W,, W, ) der C(W, W; WA ) c(W2 W3 ) where c(w, w, w, ) denotes the count of times the trigram w, w; w, was observed in a training corpus. Remark: Naive parameter estimates of this sort are called maximum-likelihood es- timates (MLE). They have the advantage that they maximize the probability (equiv- alently, minimize the perplexity) of the training data. But they will generally perform badly on test data, unless the training data were so abundant as to include all possible trigrams many times. Hint: You will have to think about p(w ). It says that the first word w, was simply generated from a unigram model, conditioned on 0 words of context. Similarly, p(w, | w, ) indicates that w, was generated from a bigram model, conditioned on only I word of context (namely w1). Remark: As a result, this setup is slightly different from the trigram model we discussed in class. Equation (1) doesn't model w, as the first word of a sentence. w, is just the first word you heard when you turned on the radio-the sentence might have started earlier, so wj isn't necessarily a word of the sort that starts sentences. Equation (1) also doesn't model w as the last word of a sentence. If n = 10 (chosen in advance), then w. is just the tenth word you heard-the sentence might continue after that, so w isn't necessarily a word of the sort that ends sentences
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started