Question

1 Approved Answer

Posted on Sep 30, 2024

This assignment deals with loading a simple text file into a Python structure, lists, arrays, and dataframes. a. Locate a movie script, play script, poem,

This assignment deals with loading a simple text file into a Python structure, lists, arrays, and dataframes.

a. Locate a movie script, play script, poem, or book of your choice in .txt format*. Project Gutenburg is a great resource for this if you're not sure where to start.

b. Load the words of this structure, one-by-one, into a one-dimensional, sequential Python list (i.e. the first word should be the first element in the list, while the last word should be the last element). It's up to you how to deal with special chacters -- you can remove them manually, ignore them during the loading process, or even count them as words, for example.

c. Use your list to create and print a two-column pandas data-frame with the following properties: i. Each index should mark the first occurrence of a unique word (independent of case) in the text. ii. The first column for each index should represent the word in question at that index iii. The second column should represent the number of times that particular word appears in the text.

Ex: if the first word in your text is "the" which occurs 500 times and the second is "balcony" which only appears twice, your data-frame should begin like the following:

	Word	Count
1	"the"	500
2	"balcony"	2
...	...	...

d. The co-occurrence of two events represents the likelihood of the two occurring together. A simple example of co-occurrence in texts is a predecessor-successor relationship -- that is, the frequency with which one word immediately follows another. The word "cellar," for example, is commonly followed by "door."

For this task, you are to construct a 2-dimensional predecessor-successor co-occurrence array as follows**: i. The row index corresponds to the word from the same index in part c.'s data-frame. ii. The column index likewise corresponds to the word in the same index in the data-frame. iii. The value in each array location represents the count of the number of times the word corresponding to the row index immediately precedes the word correponding to the column index in the text.

e. Based on the data-frame derived in part c. and array derived in part d., determine and print the following information: i. The first occurring word in the text. ii. The unique word that first occurs last within the text. iii. The most common word iv. The least common word v. Words A and B such that B follows A more than any other combination of words. vi. The word that most commonly follows the least common word

* If you have experience with and prefer another format feel to use it. Also, I recommend sticking to relatively short documents (avoid extremely long novels).

use python.

ref code

file_name = input("Enter file name:") file1 = open(file_name, "r") d = dict() print(" File Contents are: ") for line in file1: print(line, end='') line = line.strip() line = line.lower() words = line.split(" ") for word in words: if word in d: d[word] = d[word] + 1 else: d[word] = 1 print(" Number of occurrences of each word in given text file is:") print(" =============== ") for key in list(d.keys()): print(key, ":", d[key]) file1.close()