Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Requriements Submit only the files requested Print all real numbers to 2 decimal points unless stated otherwise Restrictions No global variables may be used The

Requriements

  • Submit only the files requested
  • Print all real numbers to 2 decimal points unless stated otherwise

Restrictions

  • No global variables may be used
  • The only code that may appear outside of a function are function definitions and function calls

Description

Write a program that displays the top N most occuring words in a file along with the number of times the word appeared.

Additional Details

  • Words should be displayed from most commonly occuring to least commonly occuring
  • Case does not matter when counting words
    • HELLO and hello are to be considered the same word
    • When displaying the most commonly occuring words they should all be displayed in lowercase
  • When counting a word all leading and trailing non-alphabetical, non-numeric characters should removed for a more accurate count
    • For example
      • hello
      • hello,
      • hello.
      • hello;
      • !!$#%hello<>?/
    • Are all considered to be the same word
    • The complete list of special characters is: ,.:;"|\!@#$%^&*()_+-=[]{}<>?/~`'
  • If multiple words tie for most commonly occuring they should all be displayed
    • These words should be displayed in alpbaetical order
  • You should ignore the following words when counting the most common occuring words because they are so frequent and aren't interesting
    • a, an, and, in is, the
  • If there are fewer then N unique occurences of a word all words should be displayed
    • For example if there were 5 unique words in a file but the user asked to display the top 10 words then only the top 5 will be displayed as there are only 5 words in the file

Input

  • All input will be valid
  • You will be given the name of the file to count the words in
  • The number of top occuring words you want to see
  • A word is considred to be 1 or more consecutive non-whitespace characters

Hints

  • Don't forget that str.strip can be used to remove more than just whitespace surrounding a string
  • Dictionaries are super helpful for this problem
  • Don't forget that the sort, max, and min functions have an optional paramter called key that you can assign a function to. The return value of this function is used in the comparison
  • Don't forget that sort has an optional parameter called reverse that if set to True causes sort to sort from highest to lowest instead of the default of lowest to highest
  • There is also a function called sorted that takes an iterable and gives you back a sorted iterator that allows you to iterate of the sorted copy of the iterable

Examples

User input has underlined to help you differentiate what is user input and what is program output. You do not need to underline anything.

Assume that shake_it_off.txt contiains the lyrics to Taylor Swift's song "Shake it Off" which can be found here: shake_it_off.txt

Example 1

Enter the name of the file: shake_it_off.txt Enter how many top words you want to see: 2 The following words appeared 78 times each: shake The following words appeared 70 times each: i

Exampe 2

Enter the name of the file: shake_it_off.txt Enter how many top words you want to see: 5 The following words appeared 78 times each: shake The following words appeared 70 times each: i The following words appeared 44 times each: it, off The following words appeared 21 times each: gonna The following words appeared 15 times each: break, fake, hate, play

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data With Hadoop MapReduce A Classroom Approach

Authors: Rathinaraja Jeyaraj ,Ganeshkumar Pugalendhi ,Anand Paul

1st Edition

1774634848, 978-1774634844

More Books

Students also viewed these Databases questions

Question

Challenges Facing Todays Organizations?

Answered: 1 week ago