Question

1 Approved Answer

Posted on Sep 25, 2024

For this assignment you will construct a single python file called a4.py, with several text-processing functions. You are free to use your own solutions to

For this assignment you will construct a single python file called a4.py, with several text-processing functions. You are free to use your own solutions to these problems as building blocks for your other solutions. You may use any built-in methods you wish unless a question explicitly specifies otherwise. You may create any additional 'helper' functions you may need, however please ensure that the required functions exist, are correctly named, and well labeled. Please include the question number in the comments for each function.

You can assume that each space (' ') separates one word from the next. Note, however, that not all words are delimited by spaces on both sides. E.g. the first and last words are delimited by the start and end of file, and the last word on each line is delimited by a ' ' instead of a space. Fortunately, the default behaviour of the string.split() method is to split on ANY whitespace.

When considering unique words, you should not use case sensitivity or punctuation to distinguish words. That is, the words "Hello", "hello.", and "Hello!" should all be considered the same word (simply "hello").

Punctuation characters refers to the following set of characters: . , ? ! ; : \' \" . You may assume no other punctuation symbols exist in the provided textfiles.

A sentence is any string ending with either a '.', '?', or a '!'. You may assume that only one such character appears per sentence, and no other characters are used to terminate sentences.

Problems:

Write a function called loadTextFile() that takes a filename (string) as argument and returns the text of that file. If the file does not exist, your function should use exception handling to return an empty string, and print a simple "File not found" message to the user.

Write a function called countWords() that takes a filename (string) as an argument and returns the total number of words in the text of that file.

Write a function called countSentences() that takes a filename (string) as an argument and returns the number of sentences in the text of that file.

Write a function called removePunctuation() that takes a string of text as an argument and returns that same text with all of the punctuation characters removed from it.

Write a function called wordFrequency() that takes a filename(string) as an argument and returns a dictionary containing each unique word along with the number of times that word occurred in the text.

Write a function called countUniqueWords() that takes a filename (string) as an argument and returns the number of unique words in that file. That is, how many words not counting any duplicates.

Write a function called kWords() that takes a filename (string) and a letter (string) as arguments and returns a list of unique words that start with the given letter. e.g. kWords('data.txt', 'b') [baron, baboon, bolster, burgle, bake, bill]

Write a function called longestWord() that takes a filename (string) as an argument and returns the longest word in that file. Punctuation characters should not count towards the length of a word.

Write a function called writeLines() that takes a filename (string) and a list of strings as arguments and prints the given list to the given file. The function should replace the file if it already exists, and should create a new file if it does not already exist.

Write a function called reverseFile() that takes a filename (string) and creates a new file called reverse_filename, in which each line of text in the input file is printed in reverse. That is, the first line in the output file should be the last line of the input file, the second line of output should be the second last line of input. Each individual line should read the same (left-to-right) as it did in the input file.

Bonus: Write a function called followsWord() that takes a filename (string) and key-word (string) as arguments and returns a list of all unique words that follow the given keyword in the file. E.g. If the textfile contained "She sells sea shells by the sea shore", followsWord(textfile, "sea") would return ['shells','shore']. Both the keyword and the text of the file should be considered case insensitive. Note: your code should not crash in the case that the keyword is the last word in the file.