Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write a program that reports the five most frequent two-word sequences in a text file download from Project Gutenberg. The program shall: Find the beginning

Write a program that reports the five most frequent two-word sequences in a text file download from Project Gutenberg. The program shall:

  1. Find the beginning and the end of the text (look for the markers "*** START OF THE PROJECT GUTENBERG EBOOK..." and "*** END OF THE PROJECT GUTENBERG EBOOK...") and discard everything before the beginning and after the end, including the markers.
  2. Break the text into words using spaces as separators.
  3. Convert each word to the lower case and remove the punctuation, if any. If a "word" consists only of punctuation, discard it entirely. Thus, "Huck Finn is drawn from life ; Tom Sawyer also, but" shall become "huck finn is drawn from life tom sawyer also but".
  4. Count all combinations of two consecutive words (they are known as bigrams -- e.g., "huck finn," "finn is," "is drawn," "drawn from") and report the five most frequent of them.

Test your program by counting bigrams in The Adventures of Tom Sawyer, by Mark Twain. Do not write code for downloading the file.

Deliverables: the Python file and the output of the program as a text file with the bigrams and their counts, one result per line, ordered in the decreasing order of counts (the most frequent bigram at the top).

I WILL GIVE YOU UPVOTE ONLY IF YOU DELIVER EXACTLY HOW IT WANTS ABOVE. *if not you get downvote, please read the problem carefully*

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

XML Data Management Native XML And XML Enabled Database Systems

Authors: Akmal Chaudhri, Awais Rashid, Roberto Zicari, John Fuller

1st Edition

0201844524, 978-0201844528

Students also viewed these Databases questions

Question

Find the derivative of y= cos cos (x + 2x)

Answered: 1 week ago

Question

LO2 Compare three types of individual incentives.

Answered: 1 week ago