chickadee » srfi-13 » string-kmp-partial-search

Applies kmp-step across S; optional S-START/S-END bounds parameters restrict search to a substring of S. The pattern is (vector-length RV) characters long; optional P-START index indicates non-zero start of pattern in PAT.

Suppose PLEN = (vector-length RV) is the length of the pattern. I is an integer index into the pattern (that is, 0 <= I < PLEN) indicating how much of the pattern has already been matched. (This means the pattern must be non-empty -- PLEN > 0.)

  • On success, returns -J, where J is the index in S bounding the end of the pattern -- e.g., a value that could be used as the END parameter in a call to substring/shared.
  • On continue, returns the current search state I' (an index into RV) when the search reached the end of the string. This is a non-negative integer.

Hence:

  • A negative return value indicates success, and says where in the string the match occurred.
  • A non-negative return value provides the I to use for continued search in a following string.

This utility is designed to allow searching for occurrences of a fixed string that might extend across multiple buffers of text. This is why, for example, we do not provide the index of the start of the match on success -- it may have occurred in a previous buffer.

To search a character sequence that arrives in "chunks," write a loop of this form:

(let lp ((i 0))
  (and (not (end-of-data?))             ; Lose -- return #f.
       (let* ((buf (get-next-chunk))    ; Get or fill up the buffer.
              (i (string-kmp-partial-search pat rv buf i)))
         (if (< i 0) (- i)              ; Win -- return end index.
             (lp i)))))                 ; Keep looking.

Modulo start/end optional-argument parsing, this procedure could be defined as follows:

(define (string-kmp-partial-search pat rv s i c= p-start s-start s-end)
  (let ((patlen (vector-length rv)))
    (let lp ((si s-start)       ; An index into S.
             (vi i))            ; An index into RV.
      (cond ((= vi patlen) (- si))      ; Win.
            ((= si end) vi)             ; Ran off the end.
            (else (lp (+ si 1)          ; Match s[si] & loop.
                      (kmp-step pat rv (string-ref s si)
                                vi c= p-start)))))))