Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Task 2 b - More understanding of Semantic Matching ( 1 5 points ) . Which terms does LSI find similar? To understand why the

Task 2b - More understanding of Semantic Matching (15 points).
Which terms does LSI find similar?
To understand why the LSI-expanded vectors get the results they do, we're going to look at what the operator
U
does to text. In particular, the term-term matrix
U
U
T
tells us the term expansion behavior of this LSI model. Think of the term-term matrix like an operator that first maps a term to the latent space
L
k
(using
U
), then back again from
L
k
to term space (using
U
transpose). The
(
i
,
j
)
entry of
U
U
T
is a kind of association weight between term
i
and term
j
.
Write a function to get the most related terms (according to LSI) for the word "economy". To do this:
Compute the term-term matrix from the matrix U (the reduced_term_matrix variable).
Use the term-term matrix to get the association weights of all words related to the term "economy"
Sort by descending weight value.
Your function should return the top 5 words and their weights as a list of (string, float) tuples.
Do the related terms match your subjective similarity judgment?
TOC
In [19]:
Grade cell: cell-725c6b70431f4779Score: 0.0/0.0(Top)
# Please tell me more!
task_id ="2b"
In [20]:
Student's answer(Top)
def answer_semantic_similarity_b():
result = None
# YOUR CODE HERE
term = "economy"
term_index = tfidf_vectorizer.vocabulary_[term]
#calc term matrix
term_term_matrix = reduced_term_matrix @reduced_term_matrix.T
#get the associated weights of the term
related_terms_weights = term_term_matrix[term_index]
#get the top 5 related terms
top_indices = related_terms_weights.argsort()[::-1][1:6]
top_terms =[(tfidf_feature_names[i], related_terms_weights[i]) for i in top_indices]
return top_terms
#raise NotImplementedError()
return result
In [21]:
# use this cell to explore your solution
# remember to comment the function call before submitting the notebook
# answer_semantic_similarity_b()
In [22]:
Grade cell: cell-683419d1db09c762Score: 0.0/15.0(Top)
print(f"Task {task_id}- AG tests")
stu_ans = answer_semantic_similarity_b()
print(f"Task {task_id}- your answer:
{stu_ans}")
assert isinstance(stu_ans, list), f"Task {task_id}: Your function should return a list. "
assert len(stu_ans)==5, f"Task {task_id}: Your list should contain five elements (the term, score tuples)."
for i, item in enumerate(stu_ans):
assert isinstance(item, tuple), f"Task {task_id}: Your answer at index {i} should be a tuple. "
assert isinstance(
item[0], str
), f"Task {task_id}: The first element of your tuple at index {i} should be a string. "
assert isinstance(
item[1],(float, np.floating)
), f"Task {task_id}: The second element of your tuple at index {i} should be a float. "
# Some hidden tests
del stu_ans

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2018 Dublin Ireland September 10 14 2018 Proceedings Part 1 Lnai 11051

Authors: Michele Berlingerio ,Francesco Bonchi ,Thomas Gartner ,Neil Hurley ,Georgiana Ifrim

1st Edition

3030109240, 978-3030109240

More Books

Students also viewed these Databases questions

Question

When is it appropriate to use a root cause analysis

Answered: 1 week ago