Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this project, you will create a python program that counts the word frequency in a list of New York Times (NYT) articles, per category

In this project, you will create a python program that counts the word frequency in a list of New York Times (NYT) articles, per category and in general.

First, make sure to download the following files to your working folder (where you save your program):

Stopwords -

https://drive.google.com/file/d/1V9rAioz980HuIigNV5tZlOmAC9qeB1BK/view?usp=sharing

NYT articles (csv) -

https://drive.google.com/file/d/1s-c75Uzzme8irdYuZW9z9kH-j9gp_m0X/view?usp=sharing

NYT article (text file, UTF-8)

https://drive.google.com/file/d/1rwFzwcSP3L2B8VSTkOEXFUjuQU-aR7Vq/view?usp=sharing

NYT article (text file, ANSI)

https://drive.google.com/file/d/1ry598L-YdtXV8DgntLd8DE8UVJ7lvLlP/view?usp=sharing

Part I - word count

In the first part you will read the external files and count the word frequencies.

The program should:

  1. Display a message stating its goal
  2. Read the StopWords.txt file (make sure to follow the right encoding)
  3. Output how many stop words are in the file
  4. Read ONE of the NYT article files. You can use EITHER the text files or the csv file, whichever is more convenient. They are the same. All file includes field names in the first line, and the articles in the following lines. All values are separated by " | ".
  5. For each article, read through ArticleTitle, ArticleSubtitle and ArticleKeywords and extract all the unique words and their frequencies (disregard lower or upper case).
  6. For each unique word, count its total frequency in each ArticleCategory, as well as overall total frequency (a sum of all the category counts).
  7. Hint: Use dictionaries + nested dictionaries!
  8. Output the following:
  9. For the whole list:
  • How many articles are in the files?
  • How many different categories are in the file?
  • How many unique words are in the file (remember to only count the words from the ArticleTitle, ArticleSubtitle and ArticleKeywords fields)
  • What's the total number of words (sum of unique word frequencies)
  • The top ten most frequent words that ARE NOT stop words + their frequency
  1. For each category:
  • Total number of unique words in the specific category
  • Total number of words in the category (sum of frequencies)
  • The top ten most frequent words in the category that ARE NOT stopwords + their frequency
  • Again, don't forget: for each article only count the words from the ArticleTitle, ArticleSubtitle and ArticleKeywords fields

Part II - save list to file

In this part you will save the word frequency list to a new csv file.

The program should:

  1. Display a message stating its goal
  2. Create a new csv file with the student name
  3. In the file, create fields for word list (where you'll store the unique words). all the different categories (where you store the word count for each specific ArticleCategory) and total count (the overall frequency sum for the unique word)
  4. Based on the lists / dictionaries created in the previous part, fill your csv file with values: all the unique words + their frequency count in each category + their total frequency count (sum of frequencies in all the categories)
  5. Save and close the file.

Part III - word search

In the last part, your program should input words from the users and output their respective frequency counts.

The program should:

  1. Display a message stating its goal
  2. Ask the user to input a word
  3. Check if the word is in the database (disregard lower or upper case)
  4. Notify the user If the word cannot be found
  5. If the word is in the database, output its frequency in each category, as well as total frequency
  6. Ask the user to input "1" to try another word or "0" to exit the program

Remember:

  • Make sure to include comments that explain all your steps (starts with #). Also use a comment to sign your name at the beginning of the program!
  • Work individually and only submit original work
  • Run the program a few times to make sure it executes and meets all the requirements
  • Submit one .py file!

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Financial management theory and practice

Authors: Eugene F. Brigham and Michael C. Ehrhardt

12th Edition

978-0030243998, 30243998, 324422695, 978-0324422696

Students also viewed these Programming questions