Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In PYTHON hm1_dir is simply a lot of text files here is an example text. There are 10 text files like the ones shown below.

In PYTHON

image text in transcribed

hm1_dir is simply a lot of text files here is an example text. There are 10 text files like the ones shown below.

Man Who Drinks 5 Diet Cokes Per Day Hoping Doctors Working On Cure For Whatever Hes Getting. Nov 15, 2013 BINGHAMTON, NY. After finishing his second can of Diet Coke of the morning, local man Derek Cowan, who reportedly drinks five of the artificially sweetened soft drinks a day, expressed his sincere hope that researchers are currently working on a cure for whatever terrible disease he's getting right now. I'm just going to optimistically anticipate that by the time the chronic ailment I'm currently developing fully progresses, a team of dedicated researchers working around the clock in a lab somewhere will have found a cure, Cowan said, noting that he's counting on scientists to invent a pill, vaccine, patch, or other medical solution in the coming years to prevent people from contracting whatever horrific, life-threatening disease you eventually get from drinking 60 or more ounces of Diet Coke each day. It makes sense because medicine is already so advanced that in 15 to 20 years, when I finally experience the full onset of whatever the hell freaky illness is slowly gestating inside of me with each sugar-free can of this shit, there's bound to be at least one cure. And I hope they start working on it soon, too, because I'm not feeling so great. Cowan added that, until that day comes, he could really go for another Diet Coke.

example 2:

Parents Finally Cave And Buy 33-Year-Old Son PlayStation 1. Nov 21, 2013. Having refused to purchase the video game console since its introduction in 1994, local parents John and Melissa Gionda confirmed Thursday that they had finally caved in and bought a Sony PlayStation 1 for their 33-year-old son, Daniel. It has some violent games I still don't approve of, but I know it's something Daniel really wanted, so we finally figured, Why not?' Melissa Gionda said shortly after purchasing a bundle package containing the PlayStation console, a 1-megabyte memory card, and copies of Crash Bandicoot, Tony Hawk's Pro Skater 2, and Spyro The Dragon for the 2002 college graduate and digital marketing analyst. We've always felt that video games would have been a huge distraction from his schoolwork and first four jobs after college, but Daniel has been patient and waited long enough to get a PlayStation. As long as he doesn't sit around all day in front of the TV, it'll be fine. And we got him an extra controller, too, so he can play it with his friends or his son Mark. Despite buying the video game system, the Giondas confirmed that they still refuse to buy the 33-year-old a copy of the 1992 Megadeth album Countdown To Extinction.

1. Download the hm1_dir from Piazza and save it in the same folder as your Python program. Within your main() function, get the directory name from a system argument. Print an appropriate error message if the sys.argv is missing and end the program. In a for loop, (a) read in each file, (b) use the string.replace() method to replace newlines with spaces, (c) use the string.lower() method to lower case the text, then use NLTK tokenizer to extract tokens, (d) make a FreqDist from the tokens, (3) print the filename and 5 most common words, (e) on each iteration through the files, add the FreqDist to a cumulative FreqDist for later Repeat the same loop as above but add the following: (a) remove punctuation symbols before tokenizing, (b) remove stop For your cumulative FreqDist for steps 2 and 3, create a cumulative frequency graph of the 50 most common words. Note that you may have to install matplotlib, and the first time matplotlib runs it takes a while 2. 3. words. 4. 1. Download the hm1_dir from Piazza and save it in the same folder as your Python program. Within your main() function, get the directory name from a system argument. Print an appropriate error message if the sys.argv is missing and end the program. In a for loop, (a) read in each file, (b) use the string.replace() method to replace newlines with spaces, (c) use the string.lower() method to lower case the text, then use NLTK tokenizer to extract tokens, (d) make a FreqDist from the tokens, (3) print the filename and 5 most common words, (e) on each iteration through the files, add the FreqDist to a cumulative FreqDist for later Repeat the same loop as above but add the following: (a) remove punctuation symbols before tokenizing, (b) remove stop For your cumulative FreqDist for steps 2 and 3, create a cumulative frequency graph of the 50 most common words. Note that you may have to install matplotlib, and the first time matplotlib runs it takes a while 2. 3. words. 4

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Introduction To Databases And Data Warehouses

Authors: Nenad Jukic, Susan Vrbsky, Svetlozar Nestorov

1st Edition

1943153191, 978-1943153190

More Books

Students also viewed these Databases questions

Question

What are the Five Phases of SDLC? Explain each briefly.

Answered: 1 week ago

Question

How can Change Control Procedures manage Project Creep?

Answered: 1 week ago