Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

As you may suspect, the 4 length common substring heuristic is just a simple approximate technique to find words with common roots. It is liable

As you may suspect, the 4 length common substring heuristic is just a simple approximate technique to find words with common roots. It is liable to fail in many situations. One failure situation is unrelated words with common substrings. For example, consider the following words:

 Ionization, Ionic, Actualization, Actual 

A string is comprised of words defined as continuous runs of alphanumeric characters separated by separators (spaces, commas, periods, semi colons, exclamation marks, any other punctuation symbol except apostrophes').

So a string might look like this :

The hungry scanner keeps a suspicious watch on doctors and their unsuspecting patients

The scanner counts words, collecting those together where a common substring of length 4 or greater occurs.

For example, in the given sentence, suspicious and unsuspecting have a common substring of length 4 "susp". Thus the scanner would output something like this :

Clearly the first two words have a common root, as do the last two. Unfortunately, the simple approach also identifies Ionization and Actualization as having a common root tue to the presence of the string "tion". We have a situation where Actualization could be counted in two slots.

Find an approach to "break ties" in these cases. What logic can you apply to declare that [Actual, Actualization] is a better match than [Actualization, Ionization] ?

Describe and implement your logic as a separate subroutine called from the function implemented in Q1.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts International Edition

Authors: David M. Kroenke

6th Edition International Edition

0133098222, 978-0133098228

More Books

Students also viewed these Databases questions

Question

What proactive strategies might you develop?

Answered: 1 week ago

Question

2. The model is credible to the trainees.

Answered: 1 week ago