Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In Python 3.6: The problem is that we need to count words and Counting and ranking word frequency in a text The goal of this

In Python 3.6: The problem is that we need to count words and Counting and ranking word frequency in a text The goal of this project is to familiarize ourselves with the processing of text files using Python (without importing specialized libraries for text parsing) and to practice working with Python data types, such as strings, lists, dictionaries, and files. Download the text of the ebook The Hound of the Baskervilles by A. Conan Doyle from the Project Gutenbergs website: https://www.gutenberg.org/files/2852/2852-0.txt Click on the Plain Text UTF-8 folder. Write a Python script that opens this text file and parses it, extracting all the words (no duplicates) during this process. Furthermore, it counts how often each such word has occurred in the text. At the end of this parsing process, the script prints out a listing to screen, which shows the 50 most frequent words along with their counts. For example: word1: 10000 word2: 9291 ... ... word50: 30 

Make sure it does not include stopwords.txt (Below)

a about above across after afterwards again against all almost alone along already also although always am among amongst amoungst amount an and another any anyhow anyone anything anyway anywhere are around as at back be became because become becomes becoming been before beforehand behind being below beside besides between beyond bill both bottom but by call can cannot cant co computer con could couldnt cry de describe detail do done down due during each eg eight either eleven else elsewhere empty enough etc even ever every everyone everything everywhere except few fifteen fify fill find fire first five for former formerly forty found four from front full further get give go had has hasnt have he hence her here hereafter hereby herein hereupon hers herse" him himse" his how however hundred i ie if in inc indeed interest into is it its itse" keep last latter latterly least less ltd made many may me meanwhile might mill mine more moreover most mostly move much must my myse" name namely neither never nevertheless next nine no nobody none noone nor not nothing now nowhere of off often on once one only onto or other others otherwise our ours ourselves out over own part per perhaps please put rather re same see seem seemed seeming seems serious several she should show side since sincere six sixty so some somehow someone something sometime sometimes somewhere still such system take ten than that the their them themselves then thence there thereafter thereby therefore therein thereupon these they thick thin third this those though three through throughout thru thus to together too top toward towards twelve twenty two un under until up upon us very via was we well were what whatever when whence whenever where whereafter whereas whereby wherein whereupon wherever whether which while whither who whoever whole whom whose why will with within without would yet you your yours yourself yourselves 

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Database Systems

Authors: Sham Navathe,Ramez Elmasri

5th Edition

B01FGJTE0Q, 978-0805317558

More Books

Students also viewed these Databases questions

Question

What is e-mail? Mention its advantages and disadvantages.

Answered: 1 week ago