Question
From Book: Text Data Analysis and Management by ChengXiang Zhai and Sean Massung Thank you Chp-3 Exercise 3.1: In what way is NLP related to
From Book: Text Data Analysis and Management by ChengXiang Zhai and Sean Massung
Thank you
Chp-3
Exercise 3.1: In what way is NLP related to text mining?
Exercise 3.3: Given a collection of documents for a specific topic, how can we use maximum
likelihood estimation to create a topic unigram language model?
Exercise 3.7: A unigram language model as defined in this chapter can take a sequence of words as
input and output its probability. Explain how this calculation has strong independence
assumptions.
Exercise 3.9: An n-gram language model records sequences of n words. How does the number of
possible parameters change if we decided to use a 2-gram (bigram) language model
instead of a unigram language model? How about a 3-gram (trigram) model? Give your
answer in terms of V , the unigram vocabulary size.
Chp-5
Exercise 5.3: Often, push and pull modes are combined in a single system. Give an example of such
an application.
Exercise 5.5: In a future chapter, we will discuss recommender systems. These are systems in
push mode that deliver information to users. What are some specific applications of recommender systems? Can you name some services available to you that fit into this access mode?
Exercise 5.7 : Design a text information system used to explore musical artists. For example, you can
search for an artists name directly. The results are displayed as a graph, with edges
to similar artists (as measured by some similarity algorithm). Use TIS access mode
vocabulary to describe this system and any enhancements you could make to satisfy
different information needs.
Ch-6
Exercise 6.1: Heres a query and document vector. What is the score for the given document using dot
product similarity?
d = f1; 0; 0; 0; 1; 4g q = f2; 1; 0; 1; 1; 1g
Exercise 6.3: Let d be a document in a corpus. Suppose we add another copy of d to collection. How
does this affect the IDF of all words in the corpus?
Exercise 6.6: If you perform stemming on words in V to create V 0 then jV 0j > jV j. True or false?
Why?
Ch-7
Exercise 7.1: How should you set the Rocchio parameters _; _; and depending on what type of
feedback you are using? That is, should the parameters be set differently if you are using
pseudo feedback compared to user-supplied relevance judgements? What about implicit
feedback through clickthrough data?
Exercise 7.9: Design a heuristic to automatically determine the best _ for mixture model feedback
on a query-by-query basis. You could look at the query itself, the number of matching
documents, or the distribution of ranking scores in the original results. Test your heuristic
by doing experiments.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started