String matching based on repetition factors
Let yidenote the concatenation of string y with itself i times. For example, (ab)3= ababab. We say that a string x ∈Σ* has repetition factor r if x = yr for some string y ∈Σ* and some r > 0. Let p(x) denote the largest r such that x has repetition factor r.
a. Give an efficient algorithm that takes as input a pattern P [1..m]and computes the value ρ(Pi) for i = 1,2,...,m. What is the running time of your algorithm?
b. For any pattern P [1..m], let (P*) be defined as max1≤i≤m ρ(pi). Prove that if the pattern P is chosen randomly from the set of all binary strings of length m, then the expected value of ρ(P) is O(1).
c. Argue that the following string-matching algorithm correctly finds all occurrences of pattern P in a text T[1..n] in time O(ρ*(P) n + m):
REPETITION-MATCHER(P, T )
1 m = P. length
2 n = T. length
3 k = 1 + ρ*(P)
4 q = 0
5 s = 0
6 while s ≤ n - m
7 if T [s + q + 1] = = P [q + 1]
8 q = q + 1
9 if q = = m
10 print "Pattern occurs with shift" s
11 if q = = m or T [s + q + 1] ≠ P[q + 1]
12 s = s C max(1, [q/k])
13 q = 0
This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only O(1) storage beyond what is required for P and T.