Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

This problem gets us started using a dictionary to hold a document index. Remember the keys are search terms and the values are a list

This problem gets us started using a dictionary to hold a document index. Remember the keys are search terms and the values are a list of documents containing that term. If we have a corpus, we can normalize and tokenize a document to get the tokens/search terms it contains. If we know a document's id, the logic of building an index is something like:

initialize index for each document in corpus: get a list of normalized tokens from the document for each token: add current document id to token's index entry

For this problem, a corpus of documents is stored in a dictionary. The key is the document id and the value is a string containing the document's text. The next two cells contain code that will populate the dictionary in your Jupyter notebook when you evaluate them.

import pickle

corpus = pickle.load(open("/usr/local/share/i427_dictionary_hw_corpus.p","rb"))

corpus

['I427 Search Informatics', 'I308 Information Representation', 'I101 Introduction to Informatics', 'Information Systems']

The only issue with the pseudocode above is that a new token needs to be added to the index if it's not already there. Let's update the pseudocode that way:

initialize index for each document in corpus: get a list of normalized tokens from the document for each token: if token is not in the index: initialize tokens entry in the index to an empty collection add current document id to token's index entry

For this problem, create a dictionary document_index that has the vocabulary in corpus as keys and for each vocabulary word, the value will be the list of document id's containing that corpus. The final answer (the contents of document_index are shown at the end to help you visualize the data structure and determine if your code worked or not.

Some other information or hints/tips:

-split on whitespace like we've seen for tokenization

-convert to lowercase like we've seen for normalization

-use the documents index in the corpus as an id. 0 for first doc, 1 for second, and so on

document_index = {} for document in corpus: normalized_tokens = tokenize(document) # is a list for token in normalized_tokens: if token not in document_index: document_index[key] = # empty collection for the value add current document id to token's index entry

print(document_index)

{'i427': [0], 'search': [0], 'informatics': [0, 2], 'i308': [1], 'information': [1, 3], 'representation': [1], 'i101': [2], 'introduction': [2], 'to': [2], 'systems': [3]}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database And Expert Systems Applications Dexa 2023 Workshops 34th International Conference Dexa 2023 Penang Malaysia August 28 30 2023 Proceedings

Database And Expert Systems Applications Dexa 2023 Workshops 34th International Conference Dexa 2023 Penang Malaysia August 28 30 2023 Proceedings

Authors: Gabriele Kotsis ,A Min Tjoa ,Ismail Khalil ,Bernhard Moser ,Atif Mashkoor ,Johannes Sametinger ,Maqbool Khan

1st Edition

303139688X, 978-3031396885

More Books

Students also viewed these Databases questions

Question

★★★★★

The following trial balance was prepared for Village Cycle Sales and Service on December 31, 2016, after the closing entries were posted: Village Cycle had the following transactions in 2017: 1....

Answered: 1 week ago

Question

★★★★★

Discuss the concept of spacetime curvature in general relativity and its role in gravitational phenomena, such as the bending of light and the formation of black holes.

Answered: 1 week ago

Question

★★★★★

4. Given that it loves to promote internally, what other steps would you suggest Hautelook take to facilitate this?

Answered: 1 week ago

Question

★★★★★

A stock's return has the following distribution Calculate the stock's expected return, standard deviation, and coefficient of variation. Demand for the Company's Prod ucts Probability of T his Demand...

Answered: 1 week ago

Question

★★★★★

This problem gets us started using a dictionary to hold a document index. Remember the keys are search terms and the values are a list of documents containing that term. If we have a corpus, we can...

Answered: 1 week ago

Question

★★★★★

Questi 6-17 (similar to) Suppose a seven-year, $1,000 bond with a 75% coupon rate and semiannual coupons is trading with a yield to maturity of 6 55 a. Is this bond currently trading at a discount,...

Answered: 1 week ago

Question

★★★★★

Incorrect Question 7 0/2 pts 7. A rectangular block of plastic weighing 5.88e7 micrograms has the following dimensions: length = 0.3540 dm, width = 15.10 mm, height = 0.02240 m. Determine the density...

Answered: 1 week ago

Question

★★★★★

Explain your solution? The Buckeye Trampoline Shop produces one size of trampolines. Out of the 100 trampolines that begin in production each month, only 75 percent pass inspection and are...

Answered: 1 week ago

Question

★★★★★

With supervisor approval, visits can be scheduled during an inmate's regularly assigned work hours for the following reasons: . family visits requested seven days prior to work hours . visits for...

Answered: 1 week ago

Question

★★★★★

The boxplot shows the poverty rates (the proportion of the population below the government's official poverty level) in all states of a certain country. The states are divided into four geographical...

Answered: 1 week ago

Question

★★★★★

Which of the following statements regarding the importance of information provided by accounting to investors and creditors is NOT correct? Question 6 Answer a . It determines the share prices and...

Answered: 1 week ago

Question

★★★★★

Provide a brief description of human resource management practices. page 51

Answered: 1 week ago

Question

★★★★★

Discuss the role of the HRM function in strategy formulation. page 74

Answered: 1 week ago

Question

★★★★★

Identify the three branches of government and the role each plays in influencing the legal environment of human resource management. page 102

Answered: 1 week ago

Previous Question Next Question