Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

FOR PYTHON a. Write program analyzeMessages(filename, minWordLengthToConsider = 1) that analyzes word frequencies in real-world text messages. Each line of the file is represents one

FOR PYTHON

a. Write program analyzeMessages(filename, minWordLengthToConsider = 1) that analyzes word frequencies in real-world text messages. Each line of the file is represents one SMS/text message. The first item on every line is a label - 'ham' or 'spam' - indicating whether that line's SMS is considered spam or not. The rest of the line contains the text of the SMS/message. For example: # spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! Call ... # ham Sorry, I'll call later in meeting. # At the end, your program must print summary information, including at least: # the number of ham and number of spam messages # the total number of words found in ham messages and in spam messages # the number of unique words found in ham messages and in spam messages # information, for both ham and for spam, about the twelve (at least) most frequently occurring words that are at least minWordLengthToConsider characters long. This information must include both the count of the number of occurrences and the relative frequency of a word's occurrence as a percentage (how many times that word appears out of the total number of words in the relevant message set. For example, if "you" appeared 80 times in ham, out of 1250 total ham word occurrences, the frequency would be 6.4%). # the average length (in words, not characters) of ham messages and of spam messages # Feel free to compute and print out additional information as well. To accomplish this, your analyzeMessages function should: # read all of the data from the input file # extract individual words from the messages. This should include an effort to get ride of "extras" such as periods, commas, question and exclamation marks, and other characters that aren't part of a word. You should probably also ignore capitalization. Thus in the sample spam message above, you probably want to treat "Congrats!" as "congrats" in your frequency analysis. Note: the string strip() method is very useful for this. I recommend you do not use the replace() method. # build two dictionaries (Note: using dictionaries is required for full credit on this assignment), one for frequencies of words appearing in spam messages, one for frequencies of words from ham messages. # extract the most frequently occurring words (of length at least minWordLengthToConsider) # computer and print summary information

This is what I have so far but I really dont know what I'm doing and dont know how to move forward.

def analyzeMessages(filename, minWordLengthToConsider =1): newfile = open(filename,encoding = "utf-8") linecount=0 for line in newfile: lineAsList = line.split() linecount = linecount + 1 word = lineAsList.strip("'.,/?:;!") word = word.lower hamlist =[] spamlist=[] numofspam=0 numofham=0 wordsinham=0 wordsinspam=0 uniqueham=0 uniquespam=0 for char in word: if char[0]== "spam": spamlist= spamlist.append(char) numofspam = numofspam+1 else: hamlist=hamlist.append(char) numofham = numofham+1 print(hamlist) print(spamlist) hamdict={} spamdict={}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Management With Website Development Applications

Authors: Greg Riccardi

1st Edition

0201743876, 978-0201743876

More Books

Students also viewed these Databases questions

Question

46. GivenMX(t).2.3et.5e3t, ndp(x), E(X), V(X).

Answered: 1 week ago