Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

We are asking you to implement a Lexicon structure to store arbitrarily long strings of ASCII chars (i.e. words). Lexicon L uses a Hash Table

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
We are asking you to implement a Lexicon structure to store arbitrarily long strings of ASCII chars (i.e. words). Lexicon L uses a Hash Table T structure along with an Array A of NUL separated words. In our case words are going to be English character words only (upper-case or lower case). Table T will be organized as a hash-table using collision-resolution by open-addressing as specied in class. You are going to use quadratic probing for h(k,i) and keep the choice of the quadratic function simple: 1'2 so that h(k, i) = (h' (k) + 2) mod N. The keys that you will hash are going to be English words. Thus function h' (k) is also going to be kept simple: the sum of the ASCIIfUnicode values of the characters minus 4 mod N, where N is the number of slots of T. Thus 'alex' (the string is between the quotation marks) is mapped to 97 + 108 + 101 + 120 4 mod N whatever N is. In the example below, for N = 11, h(alex,0) = 4. Table T however won't store key I: in it. This is because the keys are of arbitrary length. Instead, T will store pointerslreferences in the form of an index to another array A. The second table, array A will be a character array and will store the words maintained in T separated by NUL values \\0. This is not 2 characters a backslash followed by a zero. It is 1B (ASCII), 2B (UNICODE) whose all bits are set to 0, the NUL value. If you don't know what B is, it is a byte; read Handout 3. An insertion operation affects T and A. A word w is hashed, an available slot in T is computed and let that slot be t. In T[t] we store an index to table A. This index is the rst location that stores the rst character of w. The ending location is the \\0 following win A. New words that do not exist (never inserted, or inserted but subsequently deleted) are appended in A. Thus originally you need to be wise enough in choosing the appropriate size of A. If at some point you run-out of space, you increase T and thus this increases A as well. Doubling T i.e. N, is an option. This causes problems that you also need to attend to. A deletion will modify T as needed but will not erase w from A. Let it be there. So A might get dirty (i.e. it contains garbage) after several deletions. If several operations later you end up inserting w after deleting it previously, you do it the insertion way and you reinsert w, even if a dirty copy of it might still be around. You DO NOT DO a linear search to nd out if it exists arleady in A; it is inefcient. There is not much to say for a search. You need to support few more operations: Print , Create, Empty/FulllBatch with the last of those checking for an empty or full tablelarray and a mechanism to perform multiple operations in batch form. Print prints nicely T and its contents i.e. index values to A. In addition it prints nicely (linear-wise in one line) the contents of A. (For a \\0 you will do the SEMI obvious: print a backslash but not its zero). The intent of Print is to assist the grader. Print however does not print the words of A for deleted words. It prints stars for every character of a deleted word instead. (An alternative is that during deletion each such character has already been turned into a star.) Function Create creates T with N slots, and A with 15N chars and initializes A to spaces. We call a class that supports and realizes A and T a lexicon: L is one instance of a lexicon. Your code should thus implement as functions minimally the functions mentioned above: Create, Print, Empty, Full, Batch, Insert, Delete, Search. Testing utilizes a Batch function with argument a lename that is going to be provided as a command line argument. That le consists of multiple lines containing one command per line. An example le is provided in Section B of the course web-page related to the example below. Each command is a numeric equivalent of the function named earlier plus one more (for comment). Command 10 is Insert, Command 1] is Deletion, and Command 12 is Search. Command 13 is Print, Command 14 is Create. Command 15 is Comment: the rest of the line marked by a 15 is ignored- Command 14 for create has an argument Which is the value ofN. Each one of 10, 11, and 12 has an argument which is a string. 2 java mplexicon filearbitrary.txt % . /mplexicon file . txt 14 11 10 alex 10 tom 10 jerry 15 ready-to-print CAUTION: 15 is a comment string (chars, numbers, -) 13 operation 15 is skipped/ignored The six-line batch file above will print the following. The T entries for 0, 5, 9 are the indexes (first position) for alex, tom, jerry respectively. Note that the ASCII values has for 'alex' as prescribed give a 4, but for 'tom' and 'jerry' give 2, i.e. a collision occurs. A minimal output for Print is available below. T A: alex\\tom \\jerry\\ 0: CAUTION: \\ means 10 1 : It is not a tab character ! ! ! 2 : 5 3: 9 4: 0 5 : 6 : 7 : 8 : 9 : 10: If the following lines were added to the file 12 alex 12 tom 12 jerry 12 mary 11 tom 13 they will generate in addition on screen alex found at slot 4 tom found at slot 2 jerry found at slot 3 mary not found tom deleted from slot 2 T A: alex \\***\\jerry\\ U: 1 : 2 : 3: 9 4: 0 5 : 6 : 7: 8 : 9 : 10 : Deliverables. An archive per Handout 2 guidelines.1 Introduction Item-0. NJIT-ID (last 4 digits) and myUCID. Retrieve the last four digits of your NJIT-ID (NOT YOUR SSN): Handout 0, Section 2 talked about this. If you don't know your NJIT-ID, login to my . njit . edu to retrieve it. For the sake of an example, a homework will be denoted by hw, and the last four digits WXYZ. In the remainder, we will be using the underscore symbol rather than the dash (minus) symbol, where needed. The symbol - is the underscore symbol, not the minus/dash - symbol. (Java detests dashes and prefers underscores in identifiers, so we use the latter for consistency.) The PrP is called prp. 2 Canvas: Rules and filenames or filetypes. RULE-0. Accidental submit and resubmit. The last of multiple submissions will be graded. The limit however would be kept low (two or three) Do not send files by email, they will be discarded. RULE-1. Canvas and Homeworks. You are expected to submit a homework through Canvas. No file submission. Just use a textbox to write your answer(s). RULE-2. Canvas and PrP. You are expected to upload and submit the PrP through Canvas. Only a single tar or .zip file. Canvas will reject submissions greater than (or equal to) 5MiB. RULE-3: File-name of a PrP submission. The first three characters is prp, followed by a WXYZ before the filetype suffix; what WXYZ is, it was explained in Item-0 above. The suffix is .zip or .tar as intended. Example. If your ID ends with 1234, a prp should be named prp 1234. tar or prp_1234. zip. For code that generates output files (eg classes in Java) read carefully all of this handout to observe naming conventions. Observe testing guidelines involving the dearchiving of the archive you plan to sub- mit, compilation and testing on afs machines using testing instances created by you on those afs machines manipulating text files on Windows or MAC-OSX can cause newline, linefeed etc problems) and subse- quently evaluating your PrP (eg Sections 2, 3, and 4). The only permissible archive formats are zip or tar for PrP. The zip/tar archive contains ALL the source or text files of ALL options submitted. Make sure it DOES NOT include directories/folders or subdirectories hidden or not, or .class/.jar files, or other ex- ecutable/binary files. You can check for hidden directories with an Is -al in Linux. Abide to the naming conventions of Section 3. A single text file named prp_WXYZ. txt within the archive contains either a "HANDOUT 2 and PrP ADHERED; NO BUGS TO REPORT" in capital case as shown (no quotation marks needed), or it contains compilation and execution instructions, a bug report and other useful information. The first line of this file must be as in Section 3 below. RULE-4: Submissions that deviate from RULE-1 through RULE-3 get 0 points. 3 File and Class names; File Identification Write your name and last four digits of you ID in every file you submit. Every filename or (source code) class name must end with the last four digits of your ID. So use MyclassWXYZ or Myclass_WXYZ instead of Myclass stored in MyclassWXYZ. java or Myclass_WXYZ. java respectively.4 Testing and Grading (of the PrP) 4.1 Read requirements carefully. All options require command line processing and file-based I/O. If you are not familiar with them figure this out early in the semester and preferably by the midterm. Submissions that cannot handle command line processing or file-based input/output correctly risk of getting 0 points as ordinary testing will not work 4.2 Testing and Debugging. Make sure your code dearchives (no sub/directories/folders created, hidden or not), compiles and runs on an AFS machine such as afsconnect1 . njit . edu, afsconnect2. njit . edu or os122. njit . edu. This also means you should edit text files, if needed, on AFS using AFS editing tools; avoid remote editing. Do agcc -v or g++ -v or javac -version or java -version to confirm and report in the .txt file the version of compiler used. Versions available on afs are found with a module avail and loaded with say a module load gcc/4.9. 2. If compilation is to proceed in a certain file sequence add a note in the prp_WXYZ. txt file. DO NOT DO testing 'remotely' using 'software of the remote platform' to edit files on 'AFS'. If you do not know what a newline becomes in Linux/Unix, Windows, MAC-OSX, be prepared for nasty surprises. 4.3 File types to submit. C or C++ files with .c or .h or . cc or . cpp or Java files with . java are acceptable. A single text file prp_WXYZ. txt must be included per RULE-3. The more info you have in it the less chances you have to get a 0. 4.4 Presence of directory structure is not allowed; file types as in 4.3. Inclusion of binary files (.jar, .class, etc) or other file types than those allowed in 4.3 above will penalize you 80 points. Beware of MAC OSX: it has a tendency of generating Opt PrP submissions because it creates hidden directories. Do not skip step 4.2 above. 4.5 Grading For a HW, grading is more or less straightorward. For PrP, the grader will first decide and create testing instance(s) and grade your submission based on whether it passes successfully or not those testing instances using the specified interface (eg command-line processing). If your code does not pass any of these testing instances, it will get 0 points, unless there is a detailed bug report that YOU HAVE PROVIDED in prp_WXYZ. txt. Any information you provide there, it will help the grader to test your code on some reasonable inputs consistent with the default ones used for everybody else. If no information is provided by you, the grader will NOT read your code. 4.6 Grade and Canvas Grading. The PrP or HW grade will be made available in Canvas. Ignore canvas grade accumulations; canvas has no clue about the course grading scale or scheme. The only deadline is before (12-o'clock) noon of the day specified in the calendar of Handout 1 (Syllabus). ] 4.7 How to get a 0. If a submission is in .rar, or uses HashMaps or sorting of any type, or performs linear- time operations where a constant-time solution is possible (eg using Vector operations inefficiently) will get you 0 pts: Do not complain then

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Android 4 Application Development

Authors: Reto Meier

3rd Edition

1118223853, 9781118223857

More Books

Students also viewed these Programming questions