Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem Definition: You are given a string T [ 1 , n ] of n characters from a constant sized alphabet. We want
Problem Definition: You are given a string Tn of n characters from a constant sized alphabet. We want to find the longest substring of T appearing at least twice. Note that this is similar to the "Longest Common Substring" problem.
This problem can be easily solved in quadratic time using dynamic programming as follows: Let Tin denotes the suffix of T starting at location i Define the function called Longest Common Extension LCE where LCEij the length of the longest common prefix of Tin and Tjn For example, if Tn MISSISSIPPI, then Tn SSISSIPPI and Tn SSIPPI and LCElongest common prefix here is "SSI"
The answer to our problem is the maximum among all LCEijnote that i and j are different The next question is how to compute LCEij values? For a fixed ij we can compute LCEij in time equal to "LCEijsimply match the characters one by one of Tin and Tjn until we find a mismatch But this in the worst case takes On So time for computing all LCEij this way will cost On time, which is not efficient. To improve the time, note that LCEij if Ti and Tj are different, and LCEijLCEij otherwise. Using this recurrence, we can fill the DP table in On and get the answer.
Our goal is to SOLVE THIS IN LINEAR TIME and here is the idea.
Take all suffixes and sort them in lexicographic order ie alphabetically Lets define an array, called the Suffix Array SAn which denotes the sorted order of suffixes. Specifically, SAk x means Txn is the kth smallest suffix in lexicographic order.
Now, the two suffixes that shares the longest common prefix will be consecutive in the sorted array. The means, we can simply compute LCE of only those pairs of suffixes that appears consecutively and report the maximum value as the answer. Specifically, compute LCESAi SAi for i n are report the maximum as the output. In literature, we define LCPi LCESAi SAi and the array LCPn is called the Longest Common Prefix LCP array: so our answer is simply the largest element in LCP array.
The question is can we construct Suffix array and LCP array quickly? The answer is YES and here are the steps.
We can construct suffix array in linear time there exists several algorithms achieving this, and perhaps the most elegant one is called DifferenceCover or DC Algorithm
Once we have the suffix array constructed, we can run another linear algorithm known as Kasai's algorithm to get LCP array.
You task is to make a presentation, detailing both the algorithms described above and show how they can be used to solve the longest repeating substring problem in linear time.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started