Question
In this project, you will use computational thinking to develop an algorithm to solve the problem of counting the number of occurrences of a word
In this project, you will use computational thinking to develop an algorithm to solve the problem of counting the number of occurrences of a word and its synonyms in a corpus of text documents.
This project consists of three parts:
- Apply the four pillars of CT and describe the results of each
- Express the algorithm used in the solution using a flowchart
- Express the algorithm using a structured notation known as pseudocode
Description
With the dawn of the Information Age, the amount of data that is available on the World Wide Web has grown at incredible rates in recent years, and the ability to extract useful knowledge from that data -- whether its for personal, social, or business reasons -- is a problem that can be addressed using computational thinking.
In many cases, we start with individual documents -- emails, social media posts, product reviews, etc. -- and collect them into a corpus, and then want to search the documents in the corpus for a particular keyword, or search term, and its synonyms, which are words that have the same or similar meanings.
For example:
- You may want to look at the words you are using in your own social media content (e.g., Facebook posts, Tweets, etc.) in order to get an understanding of your own wellbeing and mental health
- You might want to analyze your emails to determine which topics you are frequently discussing
- You may want to look at reviews written by other people, such as customer feedback on products your company produces, to get an idea of whether the general sentiment is positive or negative
- Medical practitioners might want to see what words are being used in social media content to understand the spread of a disease
- Researchers in linguistics may want to understand how the use of a particular word or phrase evolves over time
Although the outcomes of all of the scenarios will be slightly different, all of them involve a common element of determining how many times a word and its synonyms appear in some collection of documents.
Setup
There are three inputs to this problem:
- Keyword: The word for which you want to conduct the search
- Thesaurus: A set of words, each of which has associated synonyms
- Corpus: A set of documents, each of which contains some number of words
The output of the solution to this problem should be the number of occurrences of the keyword and its synonyms in all the documents in the corpus.
For simplicity, in addressing this problem you do not need to worry about things like capitalization, partial word matching, punctuation, etc. and you do not need to worry about alternate spellings or word variations.
Step-By-Step Assignment Instructions
In this part of the project, you will apply the four pillars of CT to this problem by answering the following questions:
- Using decomposition, what are the primary sub-problems that need to be solved in solving the overall problem?
- Using pattern recognition, what patterns do you see in the solution, i.e., what processes need to be repeated?
- Using data abstraction and representation, how would you represent the thesaurus, the corpus, and each of the documents in the corpus?
- Using the results of the first three pillars, what is the algorithm that you would use to solve this problem? Describe it in as much detail as possible.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started