chickadee » irregex » make-irregex-chunker

(make-irregex-chunker <get-next> <get-string> [<get-start> <get-end> <get-substring> <get-subchunk>])procedure


(<get-next> chunk) => returns the next chunk, or #f if there are no more chunks

(<get-string> chunk) => a string source for the chunk

(<get-start> chunk) => the start index of the result of <get-string> (defaults to always 0)

(<get-end> chunk) => the end (exclusive) of the string (defaults to string-length of the source string)

(<get-substring> cnk1 i cnk2 j) => a substring for the range between the chunk cnk1 starting at index i and ending at cnk2 at index j

(<get-subchunk> cnk1 i cnk2 j) => as above but returns a new chunked data type instead of a string (optional)

There are two important constraints on the <get-next> procedure. It must return an eq? identical object when called multiple times on the same chunk, and it must not return a chunk with an empty string (start == end). This second constraint is for performance reasons - we push the work of possibly filtering empty chunks to the chunker since there are many chunk types for which empty strings aren't possible, and this work is thus not needed. Note that the initial chunk passed to match on is allowed to be empty.

<get-substring> is provided for possible performance improvements - without it a default is used. <get-subchunk> is optional - without it you may not use irregex-match-subchunk described above.

You can then match chunks of these types with the following procedures: