Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This assignment deals with loading a simple text file into a Python structure, lists, arrays, and dataframes. a. Locate a movie script, play script, poem,

This assignment deals with loading a simple text file into a Python structure, lists, arrays, and dataframes.

a. Locate a movie script, play script, poem, or book of your choice in .txt format*. Project Gutenburg is a great resource for this if you're not sure where to start.

b. Load the words of this structure, one-by-one, into a one-dimensional, sequential Python list (i.e. the first word should be the first element in the list, while the last word should be the last element). It's up to you how to deal with special chacters -- you can remove them manually, ignore them during the loading process, or even count them as words, for example.

c. Use your list to create and print a two-column pandas data-frame with the following properties: i. Each index should mark the first occurrence of a unique word (independent of case) in the text. ii. The first column for each index should represent the word in question at that index iii. The second column should represent the number of times that particular word appears in the text.

Ex: if the first word in your text is "the" which occurs 500 times and the second is "balcony" which only appears twice, your data-frame should begin like the following:

Word Count
1 "the" 500
2 "balcony" 2
... ... ...

d. The co-occurrence of two events represents the likelihood of the two occurring together. A simple example of co-occurrence in texts is a predecessor-successor relationship -- that is, the frequency with which one word immediately follows another. The word "cellar," for example, is commonly followed by "door."

For this task, you are to construct a 2-dimensional predecessor-successor co-occurrence array as follows**: i. The row index corresponds to the word from the same index in part c.'s data-frame. ii. The column index likewise corresponds to the word in the same index in the data-frame. iii. The value in each array location represents the count of the number of times the word corresponding to the row index immediately precedes the word correponding to the column index in the text.

e. Based on the data-frame derived in part c. and array derived in part d., determine and print the following information: i. The first occurring word in the text. ii. The unique word that first occurs last within the text. iii. The most common word iv. The least common word v. Words A and B such that B follows A more than any other combination of words. vi. The word that most commonly follows the least common word

* If you have experience with and prefer another format feel to use it. Also, I recommend sticking to relatively short documents (avoid extremely long novels).

use python.

ref code

file_name = input("Enter file name:") file1 = open(file_name, "r") d = dict() print(" File Contents are: ") for line in file1: print(line, end='') line = line.strip() line = line.lower() words = line.split(" ") for word in words: if word in d: d[word] = d[word] + 1 else: d[word] = 1 print(" Number of occurrences of each word in given text file is:") print(" =============== ") for key in list(d.keys()): print(key, ":", d[key]) file1.close()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

SQL Server Query Performance Tuning

Authors: Sajal Dam, Grant Fritchey

4th Edition

1430267429, 9781430267423

More Books

Students also viewed these Databases questions