Question: In the above question, What if you are given whole work of OSCAR WILDE, most popular playwrights in the early 1890s. a. Who knows how
In the above question, What if you are given whole work of OSCAR WILDE, most popular playwrights in the early 1890s.
a. Who knows how many books are there, let us assume there is a lot and we cannot put everything in memory. First, we need a Streaming Library so that we can read section by section in each document. Then we need a tokenizer that will give words to our program. In addition, we need some sort of dictionary let us say we will use HashTable.
b. What you need is - 1. A streaming library tokenizer, 2. A tokenizer 3. A hashmap Method: 1. Use streamers to find a stream of the given words 2. Tokenize the input text 3. If the stemmed word is in hash map, increment its frequency count else add a word to hash map with frequency 1
c. We can improve the performance by looking into parallel computing. We can use the map-reduce to solve this problem.
Multiple nodes will read and process multiple documents. Once they are done with their processing, then we can do the reduce operation by merging them.
Step by Step Solution
3.45 Rating (161 Votes )
There are 3 Steps involved in it
Youre absolutely right Heres an analysis of the provided approaches to analyzing a large corpus of O... View full answer
Get step-by-step solutions from verified subject matter experts
