Answered step by step
Verified Expert Solution
Question
1 Approved Answer
EEM 480 Homework 4 You are asked to implement the functions of a hash-based document tracking. The classes you required are about document indexing which
EEM 480 Homework 4 You are asked to implement the functions of a hash-based document tracking. The classes you required are about document indexing which enables to speed up the content search of documents. Like all search engines do, all documents in the internet is indexed and inserted to a database. Thus, whenever you search for a document including word(s), search engines can bring the documents which contain the word you are looking for in a fraction of milliseconds. Obviously, they do not perform the actual search operation in that moment, i.e. when you search for the word. Instead, they are performing the search operation when they are indexing the documents. Thus they already know which documents include which words or phrases and the time consuming search operation is shifted to the offline stage. Indexing can be done in several ways. In real life, search engines use huge matrices which keep the relationships between the documents and the words. In this assignment, we will keep the information within a hash table. In this project you are required to realize a hash table using open addressing method in order to solve collusion. Here it is preferred double hashing in order to distribute clusters evenly on database. For each word also keep the frequency varias orde to track the number of occurrences of a word in text. unintelligible 1 Index 0 1 2 3 4 5 6 except 3 is 13 was 3 -3 -2 n house front 1 This hash structure will be used to trace a text file. The program will get a path of a text file written in English. The program will trace each word and keep the number of occurrences of each word. All punctuation marks will be removed. (Ex. The boy, who has green hair is walking down the street. "boy" and "street" has to be isolated from comma or dot) Try hash table size of 1000, 5000, 10000 Check the number of occurrences of collusion. I will definitely check) Explain how you obtain key from word. Explain your Hash function using double hash to solve the collusion. Here the Interface for your HW is given as: public interface 3 Interface ! Integer Getish (String mytring); void RaadrileandGenerate tash(string Tilename, int size); void Display Result(string outputtila) void DisplayPesult(); void Displaysultordered String Output file); int show Frequency String myword: string showblackpeatedlord(); boolean check Word (String myword) float Testerficiency: 1 Here the functions and their explanations: Integer GetHash (String mytring); // generate an integer value (hash index) related to the input word. IE collusion occurs the collusion has to be solved by double hash method. void Readrileandereratellaah (String filename, int size); // Create the open address hash structure with the size given by the user. The file which contains a very long text will be parsed and during the parsing hash table must be modified by the words. void DisplayResult(String Outputrile); // All the words in the text and their frequency has to be displayed in a text file. void Display Resultordered (String Output file); // All the words and in the text and their frequency has to be displayed in a text file in an ordered fashion. The most repeated words will be listed at the beginning and the least repeated words at the end void DisplayResult(); // All the words in the text and their frequency has to be displayed on the screen. int show'requency (String myword); The frequency of myword in the text file will be given. If there is no myword in the text -1 must be returned. string showtaxRepeatedMord(); // The most repeated word has to be returned. boolean check WordString myword); // Checks whether myword is found in the text. Integer Testerficiency(); // Returns the number of collusions during paring the file. Obviously, by using a hash table instead of a matrix, we are saving from memory compare to employing a matrix. However, we are going a little bit down in terms of performance since we have to compute the hash value and go over more than one words on a linked list while looking for just one word. That's the trade off! Another important thing about your assignment is the implementation of your hash function. Main duty of the hash function is generating a digestion respect to the input data. In this project, the digestion becomes an index value for your table and the input data is the word that you are indexing. If your hash function is a loose one, the overlapping ratio of the words increase. Then it causes a bad distribution of the indexed words and increases the search time. So we would like to have a hash function such that distributes the input words in a finely manner as much as possible. Briefly, you had better to make a little research on hash functions in order to implement your's a good one. Example (In text file) Outside, even through the shut window-pane, the world looked cold. Down in the street little eddies of wind were whirling dust and torn paper into spirals, and though the sun was shining and the sky a harsh blue, there seemed to be no colour in anything, except the posters that were plastered everywhere. The blackmoustachio'd face gazed down from every commanding corner. There was one on the house front immediately opposite. BIG BROTHER IS WATCHING YOU, the caption said, while the dark eyes looked deep into Winston's own. Down at streetlevel another poster, tom at one corner, flapped fitfully in the wind, alternately covering and uncovering the single word INGSOC. In the far distance a helicopter skimmed down between the roofs, hovered for an instant like a bluebottle, and darted away again with a curving flight. It was the police patrol, snooping into people's windows. The patrols did not matter, however. Only the Thought Police mattered Behind Winston's back the voice from the telescreen was still babbling away about pig-iron and the overfulfilment of the Ninth Three-Year Plan. The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it, moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live -- did live, from habit that became instinct -- in the assumption that every sound you made was overheard, and, except in darkness, every movement scrutinized. Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste - this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vistas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with cardboard and their roofs with corrugated iron, their crazy garden walls sagging in all directions? And the bombed sites where the plaster dust swirled in the air and the willow-herb straggled over the heaps of rubble: and the places where the bombs had cleared a larger patch and there had sprung up sordid colonies of wooden dwellings like chicken-houses? But it was no use, he could not remember nothing remained of his childhood except a series of bright-lit tableaux occurring against no background and mostly unintelligible. Lutfullah is not found in the text except is found and number of occurrences is 3 There are 103 collusion occurred. EEM 480 Homework 4 You are asked to implement the functions of a hash-based document tracking. The classes you required are about document indexing which enables to speed up the content search of documents. Like all search engines do, all documents in the internet is indexed and inserted to a database. Thus, whenever you search for a document including word(s), search engines can bring the documents which contain the word you are looking for in a fraction of milliseconds. Obviously, they do not perform the actual search operation in that moment, i.e. when you search for the word. Instead, they are performing the search operation when they are indexing the documents. Thus they already know which documents include which words or phrases and the time consuming search operation is shifted to the offline stage. Indexing can be done in several ways. In real life, search engines use huge matrices which keep the relationships between the documents and the words. In this assignment, we will keep the information within a hash table. In this project you are required to realize a hash table using open addressing method in order to solve collusion. Here it is preferred double hashing in order to distribute clusters evenly on database. For each word also keep the frequency varias orde to track the number of occurrences of a word in text. unintelligible 1 Index 0 1 2 3 4 5 6 except 3 is 13 was 3 -3 -2 n house front 1 This hash structure will be used to trace a text file. The program will get a path of a text file written in English. The program will trace each word and keep the number of occurrences of each word. All punctuation marks will be removed. (Ex. The boy, who has green hair is walking down the street. "boy" and "street" has to be isolated from comma or dot) Try hash table size of 1000, 5000, 10000 Check the number of occurrences of collusion. I will definitely check) Explain how you obtain key from word. Explain your Hash function using double hash to solve the collusion. Here the Interface for your HW is given as: public interface 3 Interface ! Integer Getish (String mytring); void RaadrileandGenerate tash(string Tilename, int size); void Display Result(string outputtila) void DisplayPesult(); void Displaysultordered String Output file); int show Frequency String myword: string showblackpeatedlord(); boolean check Word (String myword) float Testerficiency: 1 Here the functions and their explanations: Integer GetHash (String mytring); // generate an integer value (hash index) related to the input word. IE collusion occurs the collusion has to be solved by double hash method. void Readrileandereratellaah (String filename, int size); // Create the open address hash structure with the size given by the user. The file which contains a very long text will be parsed and during the parsing hash table must be modified by the words. void DisplayResult(String Outputrile); // All the words in the text and their frequency has to be displayed in a text file. void Display Resultordered (String Output file); // All the words and in the text and their frequency has to be displayed in a text file in an ordered fashion. The most repeated words will be listed at the beginning and the least repeated words at the end void DisplayResult(); // All the words in the text and their frequency has to be displayed on the screen. int show'requency (String myword); The frequency of myword in the text file will be given. If there is no myword in the text -1 must be returned. string showtaxRepeatedMord(); // The most repeated word has to be returned. boolean check WordString myword); // Checks whether myword is found in the text. Integer Testerficiency(); // Returns the number of collusions during paring the file. Obviously, by using a hash table instead of a matrix, we are saving from memory compare to employing a matrix. However, we are going a little bit down in terms of performance since we have to compute the hash value and go over more than one words on a linked list while looking for just one word. That's the trade off! Another important thing about your assignment is the implementation of your hash function. Main duty of the hash function is generating a digestion respect to the input data. In this project, the digestion becomes an index value for your table and the input data is the word that you are indexing. If your hash function is a loose one, the overlapping ratio of the words increase. Then it causes a bad distribution of the indexed words and increases the search time. So we would like to have a hash function such that distributes the input words in a finely manner as much as possible. Briefly, you had better to make a little research on hash functions in order to implement your's a good one. Example (In text file) Outside, even through the shut window-pane, the world looked cold. Down in the street little eddies of wind were whirling dust and torn paper into spirals, and though the sun was shining and the sky a harsh blue, there seemed to be no colour in anything, except the posters that were plastered everywhere. The blackmoustachio'd face gazed down from every commanding corner. There was one on the house front immediately opposite. BIG BROTHER IS WATCHING YOU, the caption said, while the dark eyes looked deep into Winston's own. Down at streetlevel another poster, tom at one corner, flapped fitfully in the wind, alternately covering and uncovering the single word INGSOC. In the far distance a helicopter skimmed down between the roofs, hovered for an instant like a bluebottle, and darted away again with a curving flight. It was the police patrol, snooping into people's windows. The patrols did not matter, however. Only the Thought Police mattered Behind Winston's back the voice from the telescreen was still babbling away about pig-iron and the overfulfilment of the Ninth Three-Year Plan. The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it, moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live -- did live, from habit that became instinct -- in the assumption that every sound you made was overheard, and, except in darkness, every movement scrutinized. Winston kept his back turned to the telescreen. It was safer, though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste - this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vistas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with cardboard and their roofs with corrugated iron, their crazy garden walls sagging in all directions? And the bombed sites where the plaster dust swirled in the air and the willow-herb straggled over the heaps of rubble: and the places where the bombs had cleared a larger patch and there had sprung up sordid colonies of wooden dwellings like chicken-houses? But it was no use, he could not remember nothing remained of his childhood except a series of bright-lit tableaux occurring against no background and mostly unintelligible. Lutfullah is not found in the text except is found and number of occurrences is 3 There are 103 collusion occurred
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started