Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

From Book: Text Data Analysis and Management by ChengXiang Zhai and Sean Massung Thank you Chp-3 Exercise 3.1: In what way is NLP related to

From Book: Text Data Analysis and Management by ChengXiang Zhai and Sean Massung

Thank you

Chp-3

Exercise 3.1: In what way is NLP related to text mining?

Exercise 3.3: Given a collection of documents for a specific topic, how can we use maximum

likelihood estimation to create a topic unigram language model?

Exercise 3.7: A unigram language model as defined in this chapter can take a sequence of words as

input and output its probability. Explain how this calculation has strong independence

assumptions.

Exercise 3.9: An n-gram language model records sequences of n words. How does the number of

possible parameters change if we decided to use a 2-gram (bigram) language model

instead of a unigram language model? How about a 3-gram (trigram) model? Give your

answer in terms of V , the unigram vocabulary size.

Chp-5

Exercise 5.3: Often, push and pull modes are combined in a single system. Give an example of such

an application.

Exercise 5.5: In a future chapter, we will discuss recommender systems. These are systems in

push mode that deliver information to users. What are some specific applications of recommender systems? Can you name some services available to you that fit into this access mode?

Exercise 5.7 : Design a text information system used to explore musical artists. For example, you can

search for an artists name directly. The results are displayed as a graph, with edges

to similar artists (as measured by some similarity algorithm). Use TIS access mode

vocabulary to describe this system and any enhancements you could make to satisfy

different information needs.

Ch-6

Exercise 6.1: Heres a query and document vector. What is the score for the given document using dot

product similarity?

d = f1; 0; 0; 0; 1; 4g q = f2; 1; 0; 1; 1; 1g

Exercise 6.3: Let d be a document in a corpus. Suppose we add another copy of d to collection. How

does this affect the IDF of all words in the corpus?

Exercise 6.6: If you perform stemming on words in V to create V 0 then jV 0j > jV j. True or false?

Why?

Ch-7

Exercise 7.1: How should you set the Rocchio parameters _; _; and depending on what type of

feedback you are using? That is, should the parameters be set differently if you are using

pseudo feedback compared to user-supplied relevance judgements? What about implicit

feedback through clickthrough data?

Exercise 7.9: Design a heuristic to automatically determine the best _ for mixture model feedback

on a query-by-query basis. You could look at the query itself, the number of matching

documents, or the distribution of ranking scores in the original results. Test your heuristic

by doing experiments.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Object Databases The Essentials

Authors: Mary E. S. Loomis

1st Edition

020156341X, 978-0201563412

More Books

Students also viewed these Databases questions

Question

how would you have done things differently?

Answered: 1 week ago