A very effective pattern-matching algorithm, developed by Rabin and Karp [54], relies on the use of hashing

Question:

A very effective pattern-matching algorithm, developed by Rabin and Karp [54], relies on the use of hashing to produce an algorithm with very good expected performance. Recall that the brute-force algorithm compares the pattern to each possible placement in the text, spending O(m) time, in the worst case, for each such comparison. The premise of the Rabin-Karp algorithm is to compute a hash function, h(?), on the length-m pattern, and then to compute the hash function on all length-m substrings of the text. The pattern P occurs at substring, T[ j.. j + m?1], only if h(P) equals h(T[ j.. j+m?1]). If the hash values are equal, the authenticity of the match at that location must then be verified with the bruteforce approach, since there is a possibility that there was a coincidental collision of hash values for distinct strings. But with a good hash function, there will be very few such false matches.

The next challenge, however, is that computing a good hash function on a lengthm substring would presumably require O(m) time. If we did this for each of O(n) possible locations, the algorithm would be no better than the brute-force approach. The trick is to rely on the use of a polynomial hash code, as originally introduced in Section 10.2.1, such as

image

for a substring (x0,x1, . . . ,xm?1), randomly chosen a, and large prime p. We can compute the hash value of each successive substring of the text in O(1) time each, by using the following formula

image

Implement the Rabin-Karp algorithm and evaluate its efficiency.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Data Structures and Algorithms in Java

ISBN: 978-1118771334

6th edition

Authors: Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser

Question Posted: