chickadee » strse


Strse (rhymes with terse) is a string DSL for Scheme.

(strse "this is freaking awesome"
       "is" "at"
       "at" "was so" 2
       'word string-upcase 3
       "frea" "ve"
       "king" "ry"
       (=> adjective "very") (conc adjective ", " adjective)
       'word "nice" -1)

⇒ "that was SO very, very nice"

The first argument is the source string, followed by any number of alternating search patterns and replacement expressions. A replacement expression can optionally be followed by an integer or boolean for extra magic.

Search patterns

Strse is nothing but a thin, glory-hogging, unnecessary veneer on top of Alex Shinn's wonderful irregex, so search patterns can be both Unix style regexes and sexp style SREs.

(strse "banana" "na$" "lity")

⇒ "banality"

(strse "banana" 'eol "rama")

⇒ "bananarama"

Replacement expressions

Replacement expressions are a single expression (but one that has access to all of Scheme, including begin, let etc).

Replacement expressions have access to two anaphoric vars. You can get the whole input string (for the current step) with the name it, and you can get the matches and submatches by giving numeric arguments to m. So (m 0) is the whole match, and (m 1) the first submatch etc.

If a replacement expression evaluates to a string, that becomes the replacement text for the match.

If it evaluates to a procedure, it is applied to the matched substring (as a whole, not considering submatches). If that outputs a string, that becomes the replacement text for the match.

If you just want to execute side-effects on a match without changing the string, wrap them in a then special form.

(strse "hippopotamus"
       "elephant" (then (print "I saw an elephant!"))
       "hippo" (then (print "I saw a hippo!"))
       "tiger" (then (print "I saw a cat!")))

I saw a hippo!br⇒ "hippopotamus"

Another example:

(define (acc)
  (let ((things '()))
    (lambda thing
      (if (null? thing)
      (push! (car thing) things)))))

(define (extract str)
  (define digs (acc))
  (define words (acc))
  (strse str
     (= 3 num) (then (digs (string->number (m 0))))
     (+ alpha) (then (words (m 0))))
  (list (digs) (words)))

(extract "it will get 234 and 123 and 747 but not 1983 or 42 but then again 420")

⇒ ((420 198 747 123 234) ("again" "then" "but" "or" "not" "but" "and" "and" "get" "will" "it"))

Extra magic, part one!

If you supply a literal SRE, all named submatches are bound to their names!

(strse "oh my word" (: word " " (=> second word)) second)

⇒ "my word"

(strse "oh my word" (: word " " (=> second word))
       (conc second second second))

⇒ "mymymy word"

What the heck is a "literal SRE"? Normally, irregex SREs are quoted symbols or lists.

(strse "all vampires are named" 'word "dracula")

⇒ "dracula dracula dracula dracula"

But if strse sees a pair that does not start with quote or quasiquote, it'll get access to the named submatches in there (and then add the quote for you). Atoms are not messed with, so you can reuse previously bound regexes.

Isn't this pretty awful? Strse hogs all non-atomic expressions so you can't easily evaluate to regexes (although the code`, trick is a workaround). And, its clever name-binding trick only works with literal SREs so you can't combine it with quasiquoting and pre-baked regexes.

Silver lining: this can be different for each pattern pair in your strse call.

Even more magic

Each pair might optionally be followed by a boolean #t or #f or a number.

If you don't, you get your garden variety replace all, one pass.

A #t means keep running the same replacement recursively. This can hang unless your search eventually terminates, but it can be really handy as long as you are careful.

(strse "aaaaaaaah!" "aa" "a")

⇒ "aaaah!"

(strse "aaaaaaaah!" "aa" "a" #t)

⇒ "ah!"

An #f means nothing special if there is a match, but if there isn't, stop strse and return #f without evaluating any further.

(strse "parrot"
       "a" (begin (print "Found a") "i") #f
       "e" (begin (print "Found e") "i") #f
       "o" (begin (print "Found o") "i") #f)

Found abr⇒ #f

A zero means to replace the entire string, not just the matched part, if there is a match.

(strse "chirp chirp birds"
       "chir" "shee"
       "sheep" "The sentence got woolly" 0)

⇒ "The sentence got woolly"

A positive number means to just replace one match even if there are more. It's one-indexed so 1 is the first match. Negative numbers are the same thing except counting from the right, so -1 is the last match.


(strse? str reg)

Just returns #t if reg is in str and #f otherwise.

(strse? reg)

Returns a predicate that takes a str argument and checks if reg is in it.brIn other words, it's curried on it's second argument, kind of a backwards currying but often convenient.

For a repo,

git clone