Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem Definition: You are given a string T [ 1 , n ] of n characters from a constant sized alphabet. We want

Problem Definition: You are given a string T[1,n] of "n" characters from a constant sized alphabet. We want to find the longest substring of T, appearing at least twice. Note that this is similar to the "Longest Common Substring" problem.
This problem can be easily solved in quadratic time using dynamic programming as follows: Let T[i,n] denotes the suffix of T starting at location "i". Define the function called Longest Common Extension (LCE), where LCE(i,j)= the length of the longest common prefix of T[i,n] and T[j,n]. For example, if T[1,n]= MISSISSIPPI, then T[3,n]= SSISSIPPI and T[6,n]= SSIPPI and LCE(3,6)=3(longest common prefix here is "SSI").
The answer to our problem is the maximum among all LCE(i,j)(note that i and j are different). The next question is how to compute LCE(i,j) values? For a fixed (i,j), we can compute LCE(i,j) in time equal to "LCE(i,j)+1"(simply match the characters one by one of T[i,n] and T[j,n] until we find a mismatch). But this in the worst case takes O(n). So, time for computing all LCE(i,j) this way will cost O(n^3) time, which is not efficient. To improve the time, note that LCE(i,j)=0 if T[i] and T[j] are different, and LCE(i,j)=1+LCE(i+1,j+1), otherwise. Using this recurrence, we can fill the DP table in O(n^2) and get the answer.
Our goal is to SOLVE THIS IN LINEAR TIME and here is the idea.
Take all suffixes and sort them in lexicographic order (i.e., alphabetically). Lets define an array, called the Suffix Array SA[1,n], which denotes the sorted order of suffixes. Specifically, SA[k]= x, means T[x,n] is the k-th smallest suffix in lexicographic order.
Now, the two suffixes that shares the longest common prefix will be consecutive in the sorted array. The means, we can simply compute LCE of only those pairs of suffixes that appears consecutively and report the maximum value as the answer. Specifically, compute LCE(SA[i], SA[i+1]) for i =1,2,3,...,n-1 are report the maximum as the output. In literature, we define LCP[i]= LCE(SA[i], SA[i+1]) and the array LCP[1,n-1] is called the Longest Common Prefix (LCP) array: so, our answer is simply the largest element in LCP array.
The question is can we construct Suffix array and LCP array quickly? The answer is YES and here are the steps.
We can construct suffix array in linear time (there exists several algorithms achieving this, and perhaps the most elegant one is called Difference-Cover-3 or DC3 Algorithm).
Once we have the suffix array constructed, we can run another linear algorithm known as Kasai's algorithm to get LCP array.
You task is to make a presentation, detailing both the algorithms described above and show how they can be used to solve the longest repeating substring problem in linear time.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database 101

Authors: Guy Kawasaki

1st Edition

0938151525, 978-0938151524

More Books

Students also viewed these Databases questions

Question

The paleolithic age human life, short write up ?

Answered: 1 week ago