Question

1 Approved Answer

Posted on Sep 06, 2024

FOR PYTHON a. Write program analyzeMessages(filename, minWordLengthToConsider = 1) that analyzes word frequencies in real-world text messages. Each line of the file is represents one

FOR PYTHON

a. Write program analyzeMessages(filename, minWordLengthToConsider = 1) that analyzes word frequencies in real-world text messages. Each line of the file is represents one SMS/text message. The first item on every line is a label - 'ham' or 'spam' - indicating whether that line's SMS is considered spam or not. The rest of the line contains the text of the SMS/message. For example: # spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! Call ... # ham Sorry, I'll call later in meeting. # At the end, your program must print summary information, including at least: # the number of ham and number of spam messages # the total number of words found in ham messages and in spam messages # the number of unique words found in ham messages and in spam messages # information, for both ham and for spam, about the twelve (at least) most frequently occurring words that are at least minWordLengthToConsider characters long. This information must include both the count of the number of occurrences and the relative frequency of a word's occurrence as a percentage (how many times that word appears out of the total number of words in the relevant message set. For example, if "you" appeared 80 times in ham, out of 1250 total ham word occurrences, the frequency would be 6.4%). # the average length (in words, not characters) of ham messages and of spam messages # Feel free to compute and print out additional information as well. To accomplish this, your analyzeMessages function should: # read all of the data from the input file # extract individual words from the messages. This should include an effort to get ride of "extras" such as periods, commas, question and exclamation marks, and other characters that aren't part of a word. You should probably also ignore capitalization. Thus in the sample spam message above, you probably want to treat "Congrats!" as "congrats" in your frequency analysis. Note: the string strip() method is very useful for this. I recommend you do not use the replace() method. # build two dictionaries (Note: using dictionaries is required for full credit on this assignment), one for frequencies of words appearing in spam messages, one for frequencies of words from ham messages. # extract the most frequently occurring words (of length at least minWordLengthToConsider) # computer and print summary information

This is what I have so far but I really dont know what I'm doing and dont know how to move forward.

def analyzeMessages(filename, minWordLengthToConsider =1): newfile = open(filename,encoding = "utf-8") linecount=0 for line in newfile: lineAsList = line.split() linecount = linecount + 1 word = lineAsList.strip("'.,/?:;!") word = word.lower hamlist =[] spamlist=[] numofspam=0 numofham=0 wordsinham=0 wordsinspam=0 uniqueham=0 uniquespam=0 for char in word: if char[0]== "spam": spamlist= spamlist.append(char) numofspam = numofspam+1 else: hamlist=hamlist.append(char) numofham = numofham+1 print(hamlist) print(spamlist) hamdict={} spamdict={}