chickadee » prcc

Outdated egg!

This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for the CHICKEN 5 version of this egg, if it exists.

If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.

Prcc(Parser/Regex Combinator library for Chicken scheme)

Introduction

Prcc is a PEG-like combinator parser library and inspired by Ruby gem rsec.

Each combinator is a procedure that accepts an opaque "context" object and returns an object representing its match, or #f if it does not match.

Combinators

char CHARprocedure

Generate a parser that reads a char and returns this character as a string.

<c> CHARprocedure

Alias of char.

seq PARSER ...procedure

Sequence parser: each subparser must match, and their results are returned in a list.

<and> PARSER ...procedure

Alias of sequence parser.

sel PARSER ...procedure

Branch parser and ordered selected. Returns the result of the first parser that matches.

<or> PARSER ...procedure

Alias of branch parser.

one? PARSERprocedure

Appear 0 or 1 time. Returns the empty string if PARSER doesn't match.

<?> PARSERprocedure

Alias of one?.

rep PARSERprocedure

Repeat 0 to infinite times. Returns a list of PARSER results, with as many items as matches that were found.

Example:

(parse-string "aabba" (rep (sel (char #\a) (char #\b))))
=> ("a" "a" "b" "b" "a")
<*> PARSERprocedure

Alias of rep.

rep+ PARSERprocedure

Repeat 1 to infinite times.

<+> PARSERprocedure

Alias of rep+.

pred PARSER0 PARSER1procedure

Lookahead predicate PARSER1.

Example:

(parse-string "a" (pred (char #\a) (eof)))
=> "a"
;; If we had used (seq), we would get '("a" "")

;; This also allows us to ensure this is the entire string:
(parse-string "ab" (pred (char #\a) (eof)))
=> #f

;; Without the lookahead, it will simply consume as much as possible:
(parse-string "ab" (char #\a))
=> "a"
<&> PARSER0 PARSER1procedure

Alias of pred

pred! PARSER0 PARSER1procedure

Negative lookahead.

<&!> PARSER0 PARSER1procedure

Alias of pred!.

eofprocedure

End of file.

(act PARSER [SUCC-PROC] [FAIL-PREC])procedure

Act on the result of the parser, whether it's success or failure.

This allows you to add semantic actions to the parser.

Note: Be sure not to return #f in SUCC-PROC, because that will be filtered out.

Example:

(define a-or-b (sel (char #\a) (char #\b)))
(parse-string "aabba" (rep (act a-or-b (lambda (x) (if (string=? "a" x) 'yes 'no)))))
=> (yes yes no no yes)
(<@> PARSER [SUCC-PROC] [FAIL-PREC])procedure

Alias of act.

neg PARSERprocedure

Take parser failure as pass.

<^> PARSERprocedure

Alias of neg.

regexp-parser STRING #!optional CHUNK-SIZEprocedure

Generate a regexp parser.

<r> STRING #!optional CHUNK-SIZEprocedure

Alias of regexp-parser.

(lazy PARSER)syntax

Defer the binding of parser. This is useful for mutually recursive parsers, as PARSER can be defined after the use of the lazy parser.

Example:

;; Without "lazy" around bar, this would give an error that
;; bar is not yet defined.
(define foo (sel (char #\x) (lazy bar)))
(define bar (char #\y))
cached PARSERprocedure

Cache parser result(packrat parsing).

Helpers

str STRINGprocedure

A string parser.

<s> STRINGprocedure

Alias of str.

one-of STRINGprocedure

Parse one of chars in STRING.

join+ PARSER0 PARSER1procedure

Repeat PARSER0 one or more times, interspersed by PARSER1.

Example:

;; Parse an array of "a" or "b" identifiers:
;; This can be done more elegantly with rep+_
(define ident (sel (char #\a) (char #\b)))

(parse-string
   "[a,b,b,a]"
   (even (ind (seq (char #\[) (join+ ident (char #\,)) (char #\])) 1)))

=> ("a" "b" "b" "a")
(join+_ PARSER0 PARSER1 [skip: PARSER2])procedure

Repeat PARSER0 with PARSER1 inserted but skip PARSER2. By default, PARSER2 is spaces parser (<s*>).

ind SEQ-PARSER INDEXprocedure

Return the value of SEQ_PARSER output that is indicated by INDEX.

Example:

(parse-string "xy" (ind (seq (char #\x) (char #\y)) 1))
=> "y"
<#> SEQ-PARSER INDEXprocedure

Alias of ind.

<w>procedure

A word letter (any uppercase or lowercase letter, digit or underscore, i.e. the same as (<r> "\\w")).

<w*>procedure

Zero or more word letters.

<w+>procedure

One or more word letters.

<space>procedure

One whitespace character (space, tab or newline).

<s*>procedure

Zero or more whitespace characters.

<s+>procedure

One or more whitespace characters.

(rep_ PARSER0 [skip: PARSER1])procedure

Repeat PARSER0 from 0 to infinite times, but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

(<*_> PARSER0 [skip: PARSER1])procedure

Alias of rep_.

(rep+_ PARSER0 [skip: PARSER1])procedure

Repeat PARSER0 from 1 to infinite times, but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

Example:

;; Parse an array of "a" or "b" identifiers:
(define ident (sel (char #\a) (char #\b)))

(parse-string
   "[a,b,b,a]"
   (ind (seq (char #\[) (rep+_ a-or-b skip: (char #\,)) (char #\])) 1))
=> ("a" "b" "b" "a")
(<+_> PARSER0 [skip: PARSER1])procedure

Alias of rep+_.

(seq_ PARSER ... [skip: PARSER1])procedure

Sequence parser but skip PARSER1. By default, PARSER1 is spaces parser (<s*>).

(and_ PARSER ... [skip: PARSER1])procedure

Alias of seq_.

even SEQ-PARSERprocedure

Generate a parser which returns the elements at even-numbered positions of sequence parser output, collected in a list.

Note: This starts counting at zero!

Example:

(parse-string "abcde" (even (seq (char #\a) (char #\b) (char #\c) (char #\d) (char #\e))))
=> ("a" "c" "e")
odd SEQ-PARSERprocedure

Generate a parser which returns the elements at odd-numbered positions of sequence parser output, collected in a list.

Note: This starts counting at zero!

Example:

(parse-string "abcde" (odd (seq (char #\a) (char #\b) (char #\c) (char #\d) (char #\e))))
=> ("b" "d")
parse-file FILENAME PARSER #!optional CACHEprocedure

Parse a file with PARSER. By default, no cache (CACHE=#f).

parse-string STRING PARSER #!optional CACHEprocedure

Parse a string with PARSER. By default, no cache (CACHE=#f).

(parse-port PORT PARSER [CACHE])syntax

Parse from PORT with PARSER. By default, no cache (CACHE=#f).

Example

(use prcc)

(define parser
  (<and>
    (<@> (<s> "hello")
      (lambda (o) "hello "))
    (<s> "world")
    (eof)))

(display (parse-string "helloworld" parser))
(newline)

More information

PEG wiki page

Packrat Parsing and Parsing Expression Grammars

Author

Wei Hu

License

 Copyright (C) 2012, Wei Hu
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 
 Redistributions of source code must retain the above copyright notice, this
 list of conditions and the following disclaimer.
 Redistributions in binary form must reproduce the above copyright notice,
 this list of conditions and the following disclaimer in the documentation
 and/or other materials provided with the distribution.
 Neither the name of the author nor the names of its contributors may be
 used to endorse or promote products derived from this software without
 specific prior written permission.
 
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.

Version History

0.1
initial release

Contents »