A very effective pattern-matching algorithm, developed by Rabin and Karp [54], relies on the use of hashing
Question:
A very effective pattern-matching algorithm, developed by Rabin and Karp [54], relies on the use of hashing to produce an algorithm with very good expected performance. Recall that the brute-force algorithm compares the pattern to each possible placement in the text, spending O(m) time, in the worst case, for each such comparison. The premise of the Rabin-Karp algorithm is to compute a hash function, h(?), on the length-m pattern, and then to compute the hash function on all length-m substrings of the text. The pattern P occurs at substring, T[ j.. j + m?1], only if h(P) equals h(T[ j.. j+m?1]). If the hash values are equal, the authenticity of the match at that location must then be verified with the bruteforce approach, since there is a possibility that there was a coincidental collision of hash values for distinct strings. But with a good hash function, there will be very few such false matches.
The next challenge, however, is that computing a good hash function on a lengthm substring would presumably require O(m) time. If we did this for each of O(n) possible locations, the algorithm would be no better than the brute-force approach. The trick is to rely on the use of a polynomial hash code, as originally introduced in Section 10.2.1, such as
for a substring (x0,x1, . . . ,xm?1), randomly chosen a, and large prime p. We can compute the hash value of each successive substring of the text in O(1) time each, by using the following formula
Implement the Rabin-Karp algorithm and evaluate its efficiency.
Step by Step Answer:
Data Structures and Algorithms in Java
ISBN: 978-1118771334
6th edition
Authors: Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser