Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Your Assignment - SpellChecker.java You are to write a program that implements a rudimentary spellchecker. It should use a inner class named Lexicon that is

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

Your Assignment - SpellChecker.java You are to write a program that implements a rudimentary spellchecker. It should use a inner class named Lexicon that is implemented as a Trie. The Lexicon Class A Trie is a tree where every node can have up to 26 children one for each letter of the alphabet. Below is a figure of a Trie containing just a few short words beginning with c, d, and e. The nodes that represent the end of words have concentric circles In your Trie, each node should have an array of 26 children, as well as a boolean value, named something like isWord. isWord is set to true only if this node represents a word. Lexicon: car cars cat cats do dog dogs done -O ears eat eats -O The Lexicon class has a constructor and one public method. You are, of course, free to use private helper methods. These will not be tested The constructor should read all the words from enablelaugmented.txt and build a Trie containing them enablelaugmented.txt is a list of thousands of English words, that has been augmented with some contractions at the end without the apostrophes. public boolean containsword (String word) which retuns true if the word is in the lexicon, false otherwise. Once the Trie is constucted, searching it is straightforward. Say we are searching for the word "cat". We start at the root, go to the 'c' subtree, go to its 'a' subtree and go to its t subtree. At this point we look and see that the iswordis true, so we return rue. That is "cat is a word. are two ways that false may be retuned If we come to the t' node andisword is false, then we retuun false. We might also come to the point where there is no child node to move to. In the example above, if we search for "does" we will visit the 'd' subtree and its 'o subtree, but there is no e' subtree firom there. In this case, we also return false. Looking up words in a Tie is a Ok operation where k is the number of letters in the word. The Spellchecker Program Your program should create the lexicon, then prompt the user for an input filename. You should process the input file line at a time. After reading in the line of text, you should make it into all lowercase and remove all punctuation. Then, each word in the line is looked up in the lexicon. Ifit is not found, the program writes the line mumber (stat counting at1), the word that was misspelled, and a sorted list of suggestions. For example, given this imput file This is a 1nef text that has a missspelling in it generates the following output using the default wordlist file, enablelaugmented.txt: Line 1: "1ne is not spelled correctly! Suggestions: ane, lane, lee, lie, line, lone, lune, lye, ne, one] Line 1: "missspelling" is not spelled correctly Suggestions: miss spelling, misspelling Generating the Suggestions There are two popular text-mode spell checkers on UnixLinux systems. One is called ispell, the other is a GNU "firee software program called aspell. Both use similar techniques for generating suggestions for misspelled words. Here are five typical techniques: Swapping each adjacent pair of characters in the word. Example: "hte" should generate the suggestion "the" . In between each adjacent pair of characters in the word (also before the first character and after the last character) each letter firom a' through 'z is inserted. Example: "lne should generate the suggestions "lne" and "line" Deleting each character from the word. Example: "missspelling" should generate the suggestion "misspelling Replacing each character in the word with each letter from 'a through z. Example: .Iqne" should generate the suggestions "Lane" and "line" .Splitting the word into a pair of words by adding a space in between each adjacent pair of characters in the word. It should be noted that this will only generate a suggestion if both words in the pair are found in the lexicon Example: "missspelling" should generate the suggestion "miss spelling" Your SpellChecker class should generate suggestions using all of these techniques. It should be noted that there are other ways to generate suggestions, includng usng algorithms that pay attention not only to the letters, but what the letters actually sound like. One such well-known algorithm is called the Soundex algorithm. You are not required to implement such algorithms for this project. You may hardcode the word list file name. Before you turn in the program, make sure the filename refers to enablelaugmented.txt. Do NOT make a copy of this file into your lab account. It is 1.7M is size. You may, of course, make a copy onto your own computer, but if you do so, make certain that your submitted project refers either to the one in the course directory or a local copy. E.g. Do not include a path starting with C:l or the equivalent. Your program Your program should ignore all punctuation. For example "Holy cow!" should be treated as two words, "holy" and "cow". Leading and trailing punctuation should be eliminated. Dashes and hyphens occur inside of words and should be replaced by spaces. Apostrophes may appear inside of words. These should just be eliminated. Your instructor has included the following files for you test your code 1. should ignore capitalization. 3. 4. enablelaugmented.txt which is described above The following files to test your code when using enable laugmented.txt a. b. OCaptainMyCaptain.txt which contains Walt Whitman's famous eulogy to Abraham Lincoln. i. ii. WhereTheSideWalkEnds.txt which contains Shel Silverstein's famous poem. tinyLexicon.txt which contains the words in the example above The following file to test your code when using tinyLexicon.txt c. d. i. nonsense.txt scrabble.pdf structure called a Dawg to play Scrabble e. which is a research paper from the 1980s that uses a Trie and a related optimized data In this project you will develop a data structure known as a Trie A Trie fills the same data structures niche as a HashSet String> . You are required to use a Trie. Using a HashSet String or other daa structure instead of the Trie will result in a grade of 0 Your Assignment - SpellChecker.java You are to write a program that implements a rudimentary spellchecker. It should use a inner class named Lexicon that is implemented as a Trie. The Lexicon Class A Trie is a tree where every node can have up to 26 children one for each letter of the alphabet. Below is a figure of a Trie containing just a few short words beginning with c, d, and e. The nodes that represent the end of words have concentric circles In your Trie, each node should have an array of 26 children, as well as a boolean value, named something like isWord. isWord is set to true only if this node represents a word. Lexicon: car cars cat cats do dog dogs done -O ears eat eats -O The Lexicon class has a constructor and one public method. You are, of course, free to use private helper methods. These will not be tested The constructor should read all the words from enablelaugmented.txt and build a Trie containing them enablelaugmented.txt is a list of thousands of English words, that has been augmented with some contractions at the end without the apostrophes. public boolean containsword (String word) which retuns true if the word is in the lexicon, false otherwise. Once the Trie is constucted, searching it is straightforward. Say we are searching for the word "cat". We start at the root, go to the 'c' subtree, go to its 'a' subtree and go to its t subtree. At this point we look and see that the iswordis true, so we return rue. That is "cat is a word. are two ways that false may be retuned If we come to the t' node andisword is false, then we retuun false. We might also come to the point where there is no child node to move to. In the example above, if we search for "does" we will visit the 'd' subtree and its 'o subtree, but there is no e' subtree firom there. In this case, we also return false. Looking up words in a Tie is a Ok operation where k is the number of letters in the word. The Spellchecker Program Your program should create the lexicon, then prompt the user for an input filename. You should process the input file line at a time. After reading in the line of text, you should make it into all lowercase and remove all punctuation. Then, each word in the line is looked up in the lexicon. Ifit is not found, the program writes the line mumber (stat counting at1), the word that was misspelled, and a sorted list of suggestions. For example, given this imput file This is a 1nef text that has a missspelling in it generates the following output using the default wordlist file, enablelaugmented.txt: Line 1: "1ne is not spelled correctly! Suggestions: ane, lane, lee, lie, line, lone, lune, lye, ne, one] Line 1: "missspelling" is not spelled correctly Suggestions: miss spelling, misspelling Generating the Suggestions There are two popular text-mode spell checkers on UnixLinux systems. One is called ispell, the other is a GNU "firee software program called aspell. Both use similar techniques for generating suggestions for misspelled words. Here are five typical techniques: Swapping each adjacent pair of characters in the word. Example: "hte" should generate the suggestion "the" . In between each adjacent pair of characters in the word (also before the first character and after the last character) each letter firom a' through 'z is inserted. Example: "lne should generate the suggestions "lne" and "line" Deleting each character from the word. Example: "missspelling" should generate the suggestion "misspelling Replacing each character in the word with each letter from 'a through z. Example: .Iqne" should generate the suggestions "Lane" and "line" .Splitting the word into a pair of words by adding a space in between each adjacent pair of characters in the word. It should be noted that this will only generate a suggestion if both words in the pair are found in the lexicon Example: "missspelling" should generate the suggestion "miss spelling" Your SpellChecker class should generate suggestions using all of these techniques. It should be noted that there are other ways to generate suggestions, includng usng algorithms that pay attention not only to the letters, but what the letters actually sound like. One such well-known algorithm is called the Soundex algorithm. You are not required to implement such algorithms for this project. You may hardcode the word list file name. Before you turn in the program, make sure the filename refers to enablelaugmented.txt. Do NOT make a copy of this file into your lab account. It is 1.7M is size. You may, of course, make a copy onto your own computer, but if you do so, make certain that your submitted project refers either to the one in the course directory or a local copy. E.g. Do not include a path starting with C:l or the equivalent. Your program Your program should ignore all punctuation. For example "Holy cow!" should be treated as two words, "holy" and "cow". Leading and trailing punctuation should be eliminated. Dashes and hyphens occur inside of words and should be replaced by spaces. Apostrophes may appear inside of words. These should just be eliminated. Your instructor has included the following files for you test your code 1. should ignore capitalization. 3. 4. enablelaugmented.txt which is described above The following files to test your code when using enable laugmented.txt a. b. OCaptainMyCaptain.txt which contains Walt Whitman's famous eulogy to Abraham Lincoln. i. ii. WhereTheSideWalkEnds.txt which contains Shel Silverstein's famous poem. tinyLexicon.txt which contains the words in the example above The following file to test your code when using tinyLexicon.txt c. d. i. nonsense.txt scrabble.pdf structure called a Dawg to play Scrabble e. which is a research paper from the 1980s that uses a Trie and a related optimized data In this project you will develop a data structure known as a Trie A Trie fills the same data structures niche as a HashSet String> . You are required to use a Trie. Using a HashSet String or other daa structure instead of the Trie will result in a grade of 0

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions