Answered step by step
Verified Expert Solution
Question
1 Approved Answer
2. (a) Describe 2 applications in NLP where it could be useful to be able to compute the similarity between pairs of documents. [6 marks)
2. (a) Describe 2 applications in NLP where it could be useful to be able to compute the similarity between pairs of documents. [6 marks) (b) Consider Table 1, which gives the frequencies of 5 potential word features in a very small corpus of 5 documents. Assume that there are no other words and no other documents in the corpus. Word Feature Doc A Doc B Doc C Doc D Doc E bridge 0 1 2 1 0 capital 1 1 0 0 0 the 5 ancient 0 3 inhospitable 1 1 Table 1: Frequencies of 5 Word Features in 5 Documents 4 4 5 1 0 3 0 i. For documents A and B compute the tf-idf score associated with each of the 5 word features given. [10 marks] ii. For documents A and B compute the positive pointwise mutual information (PPMI) between the document and each of the 5 word features given. [10 marks) iii. Why is a representation based on tf-idf or PPMI better than one based on raw frequency when considering document similarity? [6 marks]
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started