chickadee » prcc

Prcc(Parser/Regex Combinator library for Chicken scheme)

Introduction

Prcc is a PEG-like combinator parser library and inspired by Ruby gem rsec.

Each combinator is a procedure that accepts an opaque "context" object and returns an object representing its match, or #f if it does not match.

Combinators

(char CHAR) procedure

Generate a parser that reads a char and returns this character as a string.

(<c> CHAR) procedure

Alias of char.

(seq PARSER ...) procedure

Sequence parser: each subparser must match, and their results are returned in a list.

(<and> PARSER ...) procedure

Alias of sequence parser.

(sel PARSER ...) procedure

Branch parser and ordered selected. Returns the result of the first parser that matches.

(<or> PARSER ...) procedure

Alias of branch parser.

(one? PARSER) procedure

Appear 0 or 1 time. Returns the empty string if PARSER doesn't match.

(<?> PARSER) procedure

Alias of one?.

(rep PARSER) procedure

Repeat 0 to infinite times. Returns a list of PARSER results, with as many items as matches that were found.

Example:

(parse-string "aabba" (rep (sel (char #\a) (char #\b))))
=> ("a" "a" "b" "b" "a")
(<*> PARSER) procedure

Alias of rep.

(rep+ PARSER) procedure

Repeat 1 to infinite times.

(<+> PARSER) procedure

Alias of rep+.

(pred PARSER0 PARSER1) procedure

Lookahead predicate PARSER1.

Example:

(parse-string "a" (pred (char #\a) (eof)))
=> "a"
;; If we had used (seq), we would get '("a" "")

;; This also allows us to ensure this is the entire string:
(parse-string "ab" (pred (char #\a) (eof)))
=> #f

;; Without the lookahead, it will simply consume as much as possible:
(parse-string "ab" (char #\a))
=> "a"
(<&> PARSER0 PARSER1) procedure

Alias of pred

(pred! PARSER0 PARSER1) procedure

Negative lookahead.

(<&!> PARSER0 PARSER1) procedure

Alias of pred!.

(eof) procedure

End of file.

(act PARSER [SUCC-PROC] [FAIL-PREC]) procedure

Act on the result of the parser, whether it's success or failure.

This allows you to add semantic actions to the parser.

Note: Be sure not to return #f in SUCC-PROC, because that will be filtered out.

Example:

(define a-or-b (sel (char #\a) (char #\b)))
(parse-string "aabba" (rep (act a-or-b (lambda (x) (if (string=? "a" x) 'yes 'no)))))
=> (yes yes no no yes)
(<@> PARSER [SUCC-PROC] [FAIL-PREC]) procedure

Alias of act.

(neg PARSER) procedure

Take parser failure as pass.

(<^> PARSER) procedure

Alias of neg.

(regexp-parser STRING [CHUNK-SIZE]) procedure

Generate a regexp parser.

(<r> STRING [CHUNK-SIZE]) procedure

Alias of regexp-parser.

(lazy PARSER) syntax

Defer the binding of parser. This is useful for mutually recursive parsers, as PARSER can be defined after the use of the lazy parser.

Example:

;; Without "lazy" around bar, this would give an error that
;; bar is not yet defined.
(define foo (sel (char #\x) (lazy bar)))
(define bar (char #\y))
(cached PARSER) procedure

Cache parser result(packrat parsing).

Helpers

(str STRING) procedure

A string parser.

(<s> STRING) procedure

Alias of str.

(one-of STRING) procedure

Parse one of chars in STRING.

(join+ PARSER0 PARSER1) procedure

Repeat PARSER0 one or more times, interspersed by PARSER1.

Example:

;; Parse an array of "a" or "b" identifiers:
;; This can be done more elegantly with rep+_
(define ident (sel (char #\a) (char #\b)))

(parse-string
   "[a,b,b,a]"
   (even (ind (seq (char #\[) (join+ ident (char #\,)) (char #\])) 1)))

=> ("a" "b" "b" "a")
(join+_ PARSER0 PARSER1 [skip: PARSER2]) procedure

Repeat PARSER0 with PARSER1 inserted but skip PARSER2. By default, PARSER2 is spaces parser (<s*>).

(ind SEQ-PARSER INDEX) procedure

Return the value of SEQ_PARSER output that is indicated by INDEX.

Example:

(parse-string "xy" (ind (seq (char #\x) (char #\y)) 1))
=> "y"
(<#> SEQ-PARSER INDEX) procedure

Alias of ind.

(<w>) procedure

A word letter (any uppercase or lowercase letter, digit or underscore, i.e. the same as (<r> "\\w")).

(<w*>) procedure

Zero or more word letters.

(<w+>) procedure

One or more word letters.

(<space>) procedure

One whitespace character (space, tab or newline).

(<s*>) procedure

Zero or more whitespace characters.

(<s+>) procedure

One or more whitespace characters.

(rep_ PARSER0 [skip: PARSER1]) procedure

Repeat PARSER0 from 0 to infinite times, but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

(<*_> PARSER0 [skip: PARSER1]) procedure

Alias of rep_.

(rep+_ PARSER0 [skip: PARSER1]) procedure

Repeat PARSER0 from 1 to infinite times, but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

Example:

;; Parse an array of "a" or "b" identifiers:
(define ident (sel (char #\a) (char #\b)))

(parse-string
   "[a,b,b,a]"
   (ind (seq (char #\[) (rep+_ a-or-b skip: (char #\,)) (char #\])) 1))
=> ("a" "b" "b" "a")
(<+_> PARSER0 [skip: PARSER1]) procedure

Alias of rep+_.

(seq_ PARSER ... [skip: PARSER1]) procedure

Sequence parser but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

(and_ PARSER ... [skip: PARSER1]) procedure

Alias of seq_.

(even SEQ-PARSER) procedure

Generate a parser which returns the elements at even-numbered positions of sequence parser output, collected in a list.

Note: This starts counting at zero!

Example:

(parse-string "abcde" (even (seq (char #\a) (char #\b) (char #\c) (char #\d) (char #\e))))
=> ("a" "c" "e")
(odd SEQ-PARSER) procedure

Generate a parser which returns the elements at odd-numbered positions of sequence parser output, collected in a list.

Note: This starts counting at zero!

Example:

(parse-string "abcde" (odd (seq (char #\a) (char #\b) (char #\c) (char #\d) (char #\e))))
=> ("b" "d")
(parse-file FILENAME PARSER [CACHE]) procedure

Parse a file with PARSER. By default, no cache (CACHE=#f).

(parse-string STRING PARSER [CACHE]) procedure

Parse a string with PARSER. By default, no cache (CACHE=#f).

(parse-port PORT PARSER [CACHE]) syntax

Parse from PORT with PARSER. By default, no cache (CACHE=#f).

Example

(use prcc)

(define parser
  (<and>
    (<@> (<s> "hello")
      (lambda (o) "hello "))
    (<s> "world")
    (eof)))

(display (parse-string "helloworld" parser))
(newline)

More information

PEG wiki page

Packrat Parsing and Parsing Expression Grammars

Author

Wei Hu

License

 Copyright (C) 2012, Wei Hu
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 
 Redistributions of source code must retain the above copyright notice, this
 list of conditions and the following disclaimer.
 Redistributions in binary form must reproduce the above copyright notice,
 this list of conditions and the following disclaimer in the documentation
 and/or other materials provided with the distribution.
 Neither the name of the author nor the names of its contributors may be
 used to endorse or promote products derived from this software without
 specific prior written permission.
 
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.

Version History

0.1
initial release

Contents »