Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

2. (a) Describe 2 applications in NLP where it could be useful to be able to compute the similarity between pairs of documents. [6 marks)

image text in transcribed

2. (a) Describe 2 applications in NLP where it could be useful to be able to compute the similarity between pairs of documents. [6 marks) (b) Consider Table 1, which gives the frequencies of 5 potential word features in a very small corpus of 5 documents. Assume that there are no other words and no other documents in the corpus. Word Feature Doc A Doc B Doc C Doc D Doc E bridge 0 1 2 1 0 capital 1 1 0 0 0 the 5 ancient 0 3 inhospitable 1 1 Table 1: Frequencies of 5 Word Features in 5 Documents 4 4 5 1 0 3 0 i. For documents A and B compute the tf-idf score associated with each of the 5 word features given. [10 marks] ii. For documents A and B compute the positive pointwise mutual information (PPMI) between the document and each of the 5 word features given. [10 marks) iii. Why is a representation based on tf-idf or PPMI better than one based on raw frequency when considering document similarity? [6 marks]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions