Question
Using Matlab In linguistics, stemming is the process of reducing inflected words to their word stem, base, or root form. In this assignment, you are
Using Matlab
In linguistics, stemming is the process of reducing inflected words to their word stem, base, or root form. In this assignment, you are to write a simple word stemmer for English. The input is given a string text that may have punctuations or other non-alphabetical characters. Your program should stem the words in the text and and return these words as a cell array. Here are the steps your program should perform to derive and filter the word stems:
Convert any upper case letter to lower case.
Replace each non-alphabetical or non-space character to a space character. e.g., "My 1st NLP program!!!" should become: "my st nlp program "
Extract the words from the string. e.g., "my st nlp program " will result in the list: "my", "st", "nlp", and "program".
Strip the following suffixes from the words that have them: -ly, -ed, -ing, -es, -s. Each suffixes should be considered once and in that order (first strip -ly, then strip -ed, then strip -ing, etc.). e.g., the word "excitedly" turns into "excit"; the word "feeding" turns into "feed".
Remove any word from the list that is 2 characters or less.
Remove the following common words from the list: the, and, that, have, for, not
Note that the stemming strategies used in this program are over-simplistic and may not give sensible results.
>> simplestemmer( 'Learning never exhausts the mind.' ) ans = { 'learn' 'never' 'exhaust' 'mind' } >> simplestemmer( 'Simplicity is the ultimate sophistication.' ) ans = { 'simplicity' 'ultimate' 'sophistication' }
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started