chickadee » srfi-13 » kmp-step

kmp-step pat rv c i c= p-startprocedure

This function encapsulates the work performed by one step of the KMP string search; it can be used to scan strings, input ports, or other on-line character sources for fixed strings.

PAT is the non-empty string specifying the text for which we are searching. RV is the Knuth-Morris-Pratt restart vector for the pattern, as constructed by make-kmp-restart-vector. The pattern begins at PAT[P-START], and is (string-length RV) characters long. C= is the character-equality function used to construct the restart vector, typically char=? or char-ci=?.

Suppose the pattern is N characters in length: PAT[P-START, P-START + N). We have already matched I characters: PAT[P-START, P-START + I). (P-START is typically zero.) C is the next character in the input stream. kmp-step returns the new I value -- that is, how much of the pattern we have matched, including character C. When I reaches N, the entire pattern has been matched.

Thus a typical search loop looks like this:

(let lp ((i 0))
  (or (= i n)                           ; Win -- #t
      (and (not (end-of-stream))        ; Lose -- #f
           (lp (kmp-step pat rv (get-next-character) i char=? 0)))))

Example:

;; Read chars from IPORT until we find string PAT or hit EOF.
(define (port-skip pat iport)
  (let* ((rv (make-kmp-restart-vector pat))
         (patlen (string-length pat)))
    (let lp ((i 0) (nchars 0))
      (if (= i patlen) nchars                    ; Win -- nchars skipped
          (let ((c (read-char iport)))
            (if (eof-object? c) c                ; Fail -- EOF
                (lp (kmp-step pat rv c i char=? 0) ; Continue
                    (+ nchars 1))))))))

This procedure could be defined as follows:

(define (kmp-step pat rv c i c= p-start)
  (let lp ((i i))
    (if (c= c (string-ref pat (+ i p-start)))     ; Match =>
        (+ i 1)                                   ;   Done.
        (let ((i (vector-ref rv i)))              ; Back up in PAT.
          (if (= i -1) 0                          ; Can't back up more.
              (lp i)))))))                        ; Keep going.

Rationale: this procedure takes no optional arguments because it is intended as an inner-loop primitive and we do not want any run-time penalty for optional-argument parsing and defaulting, nor do we wish barriers to procedure integration/inlining.