Question: Exercise A Consider the following document D, taken from a collection C. The University of California, Riverside is one of 10 universities within the prestigious
Exercise A Consider the following document D, taken from a collection C. "The University of California, Riverside is one of 10 universities within the prestigious University of California system, and the only UC located in Inland Southern California. Widely recognized as one of the most ethnically diverse research universities in the nation." Consider the following two queries: Q1: university Riverside Q2: diverse university Characteristics of collection C are as follows: #docs in collection C: 1000 #docs in C that contain "Riverside": 100 #docs in that contain "university/ies": 200 #docs in C that contain "diverse": 150 Compute the scores of Q1 and Q2 for D, using (a) BM25, and (b) Unigram Language Model (with smoothing method of your choice). Make and state any assumptions necessary, eg, about the constants in BM25
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
