Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For this part, you will need to write a program that solves co-existence problem. What is co-existence problem? You will write a Python program to

For this part, you will need to write a program that solves co-existence problem. What is co-existence problem? You will write a Python program to solve the co-existence problem. The co-existence problem is stated as follows. We have a file containing English sentences, one sentence per line. Given a list of query words, your program should output the line number of lines that have all those words. While there are many ways to do this, the most efficient way is to use sets and dictionaries. Here is one example. Assume that the following is the content of the file. Line numbers are included for clarity; the actual file doesnt have the line numbers. 1. Try not to become a man of success, but rather try to become a man of value. 2. Look deep into nature, and then you will understand everything better. 3. The true sign of intelligence is not knowledge but imagination. 4. We cannot solve our problems with the same thinking we used when we created them. 5. Weakness of attitude becomes weakness of character. 6. You cant blame gravity for falling in love. 7. The difference between stupidity and genius is that genius has its limits. (These are attributed to Albert Einstein. ) If we are asked to find all the lines that contain this set of words: {true, knowledge, imagination} the answer will be line 3 because all three words appeared in line 3. If they appear in more than one line, your program should report all of them. For example, co-existence of {the, is} will be lines 3 and 7. IMPORTANT: You should download a text file version of book War and Piece from from Gutenberg project. You can find it here under Plain Text UTF-8: http://www.gutenberg.org/ebooks/2600 Download the text version of that book and save it in the same directory as your program. You solution should be instantaneous on that book, that is your program should produce the required dictionary in 1 or 2 seconds on that book and it should answer questions about any co-existence instantaneously. Alternatively you can find the book here since I already downloaded it: https://www.dropbox.com/s/pg4p9snzv60rp5v/WarAndPiece.txt?dl=0 Python Implementation: You need to implement the following functions: 1) open_file() The open_file function will prompt the user for a file-name, and try to open that file. If the file exists, it will return the file object; otherwise it will re-prompt until it can successfully open the file. This feature must be implemented using a while loop, and a try-except clause. 2) read_file(fp) This function has one parameter: a file object (such as the one returned by the open_file() function). This function will read the contents of that file line by line, process them and store them in a dictionary. The dictionary is returned. Consider the following string pre-processing: 1. Make everything lowercase 2. Split the line into words 3. Remove all punctuation, such as ,, ., !, etc. 4. Remove apostrophes and hyphens, e.g. transform cant into cant and first-born into firstborn 5. Remove the words that are not all alphabetic characters (do not remove cant because you have transformed it to cant, similarly for firstborn). 6. Remove the words with less than 2 characters, like a Hint for string pre-processing mentioned above: To find punctuation for removal you can import the string module and use string.punctuation which has all the punctuation. To check for words with only alphabetic characters, use the isalpha() method. Furthermore, after pre-processing, you add the words into a dictionary with the key being the word and the value is a set of line numbers where this word has appeared. For example, after processing the first line, your dictionary should look like: {'try': {1}, 'not': {1}, 'to': {1}, 'become': {1}, 'man': {1}, 'of': {1}, 'success': {1}, 'but': {1}, 'rather': {1}, 'value': {1}} This should be repeated for all the lines; the new keys are added to the dictionary, and if a key already exists, its value is updated. At the end of processing all these 7 lines, the value in the dictionary associated with key the'' will be the set {3, 4, 7}. (Note: the line numbers start from 1.) 3) find_coexistance(D, query) The first parameter is the dictionary returned by read_file; the second one is a string 2 called query. This query contains zero or more words separated by white space. You need to split them into a list of words, and find the line numbers for each word. To do that, use the intersection or union operation on the sets from D (you need to figure out which operation is appropriate). Then convert the resulting set to a sorted list, and return the sorted list. (Hint: for the first word simply grab the set from D; for subsequent words you need to use the appropriate set operation: intersection or union.) 4) #main The main part of the program should call the three functions above. Loop, prompting the user to enter space-separated words. Use that input to find the co-occurrence and print the results. Continue prompting for input until q'' or ''Q'' is inputed. Very important considerations: Every time you want to look up a key in a dictionary, first you need to make sure that the key exists. Otherwise it will result in an error. So, always use an if statement before looking up a key: if key in data_dict: ## the key exists in a dictionary, so it is safe to use data_dict[key] After you completed the program, see how it works for the two files provided: einstein.txt, and gettysburg.txt

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Access Patterns Database Interactions In Object Oriented Applications

Authors: Clifton Nock

1st Edition

0321555627, 978-0321555625

More Books

Students also viewed these Databases questions

Question

What is the most important part of any HCM Project Map and why?

Answered: 1 week ago