The main difference between retrieval functions of this form comes from their different choice of smoothing method that is applied to the unigram language model Op. In general, when we smooth with a collection background language model, we can write this probability as P(w | 0D) = P.(w |BD) ifwED app(w | C) otherwise where Ps(W | OD ) is the discounted maximum likelihood estimate of observing word w in document D and an is a document- specific coefficient that controls the amount of probability mass assigned to unseen words to ensure that all of the probabilities sum to one. Noting that log is a monotonic transform (thus leading to equivalent results under ranking), and using the above smoothing formulation, we can show the following: log P(Q | D) = _ log p(q: | 0D) _ c(w, Q) log p(w | 0D) WEV (w, Q) logp. (w | 0p) + _ c(w, Q) log app(w | C) LED c(w, Q) logp. (w | 0p) + > c(w, Q) log app(w | C) - > c(w, Q) log app(w | C) P. (w | 8D) + 1Q| logan + _ c(w, Q) logp(w | C) BED c(w, Q) 108 app(w |C) IEV rank [ (w, Q) log Pluton + 1Q| logan WED a. [5 pts] Show that if we use the query-likelihood scoring method (i.e., p(Q D)) and the Jelinek-Mercer smoothing method (i.e., fixed co-efficient interpolation with smoothing parameter )) for retrieval, we can rank documents based on the following scoring function: + (1-A) x c(w, D) score(Q, D) = _ c(w, Q) log (1+ x x p(w | REF) x [DI) WEQND where the sum is taken over all the matched query terms in D, IDI is the document length, c(w,D) is the count of word w in document D (i.e., how many times W occurs in D), c(w, Q) is the count of word w in Q, A is the smoothing parameter, and p(WIREF) is the probability of word w given by the reference language model estimated using the whole collection. b. [5 pts] This scoring function above can also be interpreted as a vector space model. If we make this interpretation, what would be the query vector? What would be the document vector? What would be the similarity function? Does the term weight in the document vector capture TF-IDF weighting and document length normalization heuristics? Why