Previous Section
 < Day Day Up > 
Next Section


Notation and terminology

We shall let Σ* (read "sigma-star") denote the set of all finite-length strings formed using characters from the alphabet Σ. In this chapter, we consider only finite-length strings. The zero-length empty string, denoted ε, also belongs to Σ*. The length of a string x is denoted |x|. The concatenation of two strings x and y, denoted xy, has length |x| + |y| and consists of the characters from x followed by the characters from y.

We say that a string w is a prefix of a string x, denoted w x, if x = wy for some string y Σ*. Note that if w x, then |w| |x|. Similarly, we say that a string w is a suffix of a string x, denoted w x, if x = yw for some y Σ*. It follows from w x that |w| |x|. The empty string ε is both a suffix and a prefix of every string. For example, we have ab abcca and cca abcca. It is useful to note that for any strings x and y and any character a, we have x y if and only if xa ya. Also note that and are transitive relations. The following lemma will be useful later.

Lemma 32.1: (Overlapping-suffix lemma)
Start example

Suppose that x, y, and z are strings such that x z and y z. If |x| |y|, then x y. If |x| |y|, then y x. If |x| = |y|, then x = y.

Proof See Figure 32.3 for a graphical proof.

Click To expand
Figure 32.3: A graphical proof of Lemma 32.1. We suppose that x z and y z. The three parts of the figure illustrate the three cases of the lemma. Vertical lines connect matching regions (shown shaded) of the strings. (a) If |x| |y|, then x y. (b) If |x| |y|, then y x. (c) If |x| = |y|, then x = y.

End example

For brevity of notation, we shall denote the k-character prefix P[1 k] of the pattern P[1 m] by Pk. Thus, P0 = ε and Pm = P = P[1 m]. Similarly, we denote the k-character prefix of the text T as Tk. Using this notation, we can state the string-matching problem as that of finding all shifts s in the range 0 s n-m such that P Ts+m.

In our pseudocode, we allow two equal-length strings to be compared for equality as a primitive operation. If the strings are compared from left to right and the comparison stops when a mismatch is discovered, we assume that the time taken by such a test is a linear function of the number of matching characters discovered. To be precise, the test "x = y" is assumed to take time Θ(t + 1), where t is the length of the longest string z such that z x and z y. (We write Θ(t + 1) rather than Θ(t) to handle the case in which t = 0; the first characters compared do not match, but it takes a positive amount of time to perform this comparison.)



Previous Section
 < Day Day Up > 
Next Section