In the above question, What if you are given whole work of OSCAR WILDE, most popular playwrights
Question:
In the above question, What if you are given whole work of OSCAR WILDE, most popular playwrights in the early 1890s.
a. Who knows how many books are there, let us assume there is a lot and we cannot put everything in memory. First, we need a Streaming Library so that we can read section by section in each document. Then we need a tokenizer that will give words to our program. In addition, we need some sort of dictionary let us say we will use HashTable.
b. What you need is - 1. A streaming library tokenizer, 2. A tokenizer 3. A hashmap Method: 1. Use streamers to find a stream of the given words 2. Tokenize the input text 3. If the stemmed word is in hash map, increment its frequency count else add a word to hash map with frequency 1
c. We can improve the performance by looking into parallel computing. We can use the map-reduce to solve this problem.
Multiple nodes will read and process multiple documents. Once they are done with their processing, then we can do the reduce operation by merging them.
Step by Step Answer:
Problems Solving In Data Structures And Algorithms Using C++
ISBN: 9789356273177
2nd Edition
Authors: Hemant Jain