Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

my code below changed actual file path to file path. you can use a paragraph of your own to test a text file need help

image text in transcribed

my code below changed actual file path to "file path".

you can use a paragraph of your own to test a text file

need help with input and output

total # of words

myfile = open ("C:file path", "r")

#print(myfile.read()) data = myfile.read()

words = data.split()

print ("Number of words is:", len(words))

********************************************************************************************

total # of occurrences for each word.

# Open the file in read mode myfile = open ("C:file path", "r")

# Create an empty dictionary dic= dict()

# Loop through each line of the file for line in myfile: # Remove the leading spaces and newline character line = line.strip() # Convert the characters in line to # lowercase to avoid case mismatch line = line.lower()

# Split the line into words words = line.split(" ")

# Iterate over each word in line for word in words:

# Check if the word is already in dictionary if word in dic:

# Increment count of word by 1 dic[word] = dic[word] + 1 else:

# Add the word to dictionary with count 1 dic[word] = 1

# Print the contents of dictionary for key in list(dic.keys()): print(key, ":", dic[key])

***************************************************************************************************

total # of characters in the text document.

num_char = 0

with open("C:file path", "r") as f: for line in f: words = line.split() num_char += len(line)

print ('Number of characters are:') print (num_char)

***************************************************************************************************

total # of blank spaces in the text document

myfile = open ("C:file path", "r")

count = 0

while True:

#this will read each character #then store in a char

char = myfile.read(1)

if char.isspace(): count += 1 if not char: break print ('Number of spaces is') print(count)

***************************************************************************************************

total # of blank spaces divided by total # of characters, multiplied by 100.

myfile = open ("C:file path", "r")

count = 0

while True:

#this will read each character #then store in a char

char = myfile.read(1)

if char.isspace(): count += 1 if not char: break #print ('Number of spaces is') #print(count)

#myfile = open ("C:file path", "r")

num_char = 0

with open("C:file path", "r") as f: for line in f: words = line.split() num_char += len(line)

#print ('Number of characters are:') #print (num_char)

#% of number of spaces per_space =(count / num_char * 100 )

two_decimal = "{:.2f}%".format(per_space) print ('The % of spaces equals', two_decimal)

Text analysis and the statistical distribution of words can tell us a lot about who wrote it and such an analysis of corporate emails was a key factor in the prosecution of corporation executives for insider trading. Specifically, the increasingly frequent use of pronouns in email was cited as a metric for detecting deceptive communication in that case. You will submit the following files compressed into a zip archive. 1. All python files for your program implementation. 2. Preprocessing files (6). 3. Design and Test documents (4) 4. The output text files (3) The client would eventually like to work along these lines, but for now wants you to expose some concepts about processing text so that they can determine how they to better analyze corporate documents as a first step. Submit your file on the Brightspace under the Project 1 Text Analysis. Since their internal documents are sensitive, they have asked you to analyze text copy of three novels: Charles Dickens' A Tale of Two Cities, Leo Tolstoy's War and Peace, and Victor Hugo's Les Miserables. The zip archive name must be formatted as follows: Each of these files is sourced from the Gutenberg Project (www.gutenberg.org). E.g. A Tale of Two Cities text file The files can be download along with these instructions from Project 1 on Brightspace. -2101-ProjectPartA.zip Example: mmako0048-2101- Project PartA.zip Marking Guide -0.5 The assignment will be marked out of 28 pts using the following guide: The submission follows the project instructions. (2 marks) The submission provides a functioning user interface (must provide feedbackotifications to users where appropriate). (4 marks) The submission demonstrates the correct use of classes/methods (must use at least two classes) (4 marks) The submission demonstrates the correct use of loops, containers (e.g. lists) and conditional logic. (4 marks) The program does not contain any logic or runtime errors. (3 marks) Proper naming conventions for variables, classes etc. (2 marks) The output text files are generated as per the specification. (3 mark) Sufficient header and inline documentation. (2 marks) Program Design document flow is easy to read and reflects your general logic (2 marks) Test plan covers basic functionality and exception handling 2 marks) I will assess the submission by trying to run one valid and one invalid file through your program as well as a review of all submission documents. Each criteria will be marked as follows to the maximum allowed by the category. Missing or improper use affecting a minor element of the objective Missing or improper use affecting a major element of the objective Run time or logic errors. -1 -1 Project Area General programming . Preprocessing to Clean Data (using text editor - not in program) Design - Program Development and Testing Input Requirement Must use Python 3, with meaningful naming and lower camel case style. Must contain header documentation that describes the purpose of the program, the author and the date. Must contain sufficient inline documentation for others to understand logic. Must properly use classes, loops and lists in the program. Must be written as Python program (py files) and shall not use Jupyter Notebook Must remove licensing terms and table of contents Must save removed text as "-PartA--novel>Removed.txt". Must save clean text as "-PartA-Clean.txt". Must covey the general logic of the program including classes/methods. You may use any combination of pseudocode, UML diagrams that you wish as long as the logic is clear. Must be named -PartA-Program. You may use Microsoft Word or Jupyter Notebook as you design document A small paragraph (from any of mentioned books) for development and testing purposes must be used. The file must be named -PartA-Sample.txt Must create a test plan named -PartA-TestPlan. extension> You may use Microsoft Word or Excel for your test plan. Must create a test output name -PartA-SampleAnalysis.txt Must accept text files for input . May allow the user to choose the text file for processing. Do not hardcode the file location or any part of the file path. Must allow the user to choose to start processing or exit without processing any text files Must be able to process a text file as input: Must be able to count the: o total # of words o total # of occurrences for each word. o total # of characters in the text document. o total # of blank spaces in the text document. Must be able to calculate the percentage of blank spaces as: o total # of blank spaces divided by total # of characters, multiplied by 100. Must implement exception handling in two areas: o when opening the text file for reading and, when writing the output file. Must create an output text file for the chosen text formatted to the client's requirements named "-PartA--novel>Analysis.txt". Must advise the user when: o processing has completed AND o the name/location of the output file. Must advise the user when: o an exception has occurred AND o the type of exception Text analysis output must be clear and easy to read Must include appropriate headers and data for: Name of text: Total Non-blank Character Count: Total Blank Character Count: Processing Output Client Text Format Requirement o o Client Text Format Requirement Text analysis output must be clear and easy to read Must include appropriate headers and data for: O Name of text: Total Non-blank Character Count: O Total Blank Character Count: o O Percentage Blank Character: Total Word Count: o Word, Count: (each word and count is on a separate line)