Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Project Description This project involves creating a spell-checker problem that accepts a word from the user, look up that word in an available corpus and

Project Description

This project involves creating a spell-checker problem that accepts a word from the user, look up that word in an available corpus and perform spell-correction on the word if the word is not present in the word corpus.

the word corpus has been loaded and is available in a string named word corpus.

You will need to do the following

  • Download both this program file and the associated google-10000-english.txt file to your computer.
  • Write a program using the WHILE loop that continuously asks the user to enter a word. If the user enters QUIT, then quit from the while loop and terminate the program. (20 points)
  • Once the user has entered the word, you will ** Compare the word with the word corpus, if there is a match, then you will let the user know that the word is valid. Note that the comparison must be case insensitive. (20 points)

** If there is no match, then you will need to look up the corpus for the word that best matches the word that the user entered and display that word to the user. (40 points)

Extra credit(20 points)

  • Allow the user to enter a paragraph and perform an automated spell correction of the paragraph. For example, if the user enters "Jack and Jill wen up the hills", your program would return something like "Jack and Jill went up the hill"

Other Points

  • 10 points will be awarded for the overall quality of the user interaction.
  • 10 points will be awarded for the proper use of Python including making sure that he code is optimal.

Hints

Typically, this is implemented by looking at each word in the list and determining the number of adds, updates, deletes that are needed in order to get from the candidate word to the input word. Each operation has a score associated with it, for example

Update - 2 point Add - 1 point Delete - 2 point

For example,

input word: wen

candidate word: win

  • To get from wen to win requires 1 update
  • Total score for win is 2 points

candidate word: went

  • To get from wen to win requires 1 add
  • Total score for win is 1 point

candidate word: hello

  • To get from hello to win requires - 3 updates and 2 deletes
  • Total score for hello is 10 points

At the end, after looking at all words in the list, you would pick the word with the lowest score as the match. In case you arrive a good match sooner, for performance reasons, you might want to stop and display the result.

Imports

In[1]:

import string 

Run this cell to load the word corpus. The variable dictionary has the list of all words in your corpus

In[2]:

corpus_file = open('google-10000-english.txt',encoding='utf-8') dictionary = corpus_file.read().split(' ') dictionary[:5] 

Out[2]:

['the', 'of', 'and', 'to', 'a'] 

Spellchecker

In[6]:

def similarity(inputWord,wordCorpus): # get list of words that share the relatively same size +/- 1 letter if len(inputWord) == 0: print("Please provide a valid word") return "INVALID_ENTRY" lowerWordLen = len(inputWord) - 1 upperWordLen = len(inputWord) + 1 # get the list of candidate words candidateWords = [] for entry in wordCorpus: # determine the set of words within one character distance of the input word # and place it in the list candidateWords if len(entry) >= lowerWordLen and len(entry) <= upperWordLen: candidateWords.append(entry) # perform similarity comparison # You will need to look for words in the candidateWords list that best match # the input word. For example, if the user input was "wen", a possible match is "went"  # or if the input word is "rabbi", a possible match is "rabbit" # All candidate words are from the text "Alice in Wonderland" bestMatchWord = None ######################################## ### Write your code here ######################################## # display the best match print("Best Match Is:",bestMatchWord) return bestMatchWord 

In[7]:

import time startTime = time.time() # take all words from alice and store them in memory wordCorpusFile = open('google-10000-english.txt',encoding='utf-8') wordCorpus = [] for line in wordCorpusFile: # remove newlines line = line.strip().lower() # get words words = line.split(" ") for word in words: if word.isalnum(): if word not in wordCorpus: wordCorpus.append(word) similarity("wen",wordCorpus) elapsedTime = time.time() - startTime print("Time taken:%s seconds" % elapsedTime) 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Financial management theory and practice

Authors: Eugene F. Brigham and Michael C. Ehrhardt

12th Edition

978-0030243998, 30243998, 324422695, 978-0324422696

Students also viewed these Programming questions

Question

Define and draw a motor unit.

Answered: 1 week ago