Outdated egg!
This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for the CHICKEN 5 version of this egg, if it exists.
If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.
TOC »
abnf
Description
abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).
Library Procedures
The combinator procedures in this library are based on the interface provided by the lexgen library.
<CoreABNF> typeclass
The procedures of this library are provided as fields of the <CoreABNF> typeclass. Please see the typeclass library for information on type classes.
The <CoreABNF> class is intended to provide abstraction over different kinds of input sequences, e.g. character lists, strings, streams, etc. The following example illustrates the creation of an instance of <CoreABNF> specialized for character lists. This code is also provided as the abnf-charlist egg, which is fully compatible with abnf prior to version 3.0.
(require-extension typeclass input-classes abnf) (define char-list-<Input> (make-<Input> null? car cdr)) (define char-list-<Token> (Input->Token char-list-<Input>)) (define char-list-<CharLex> (Token->CharLex char-list-<Token>)) (define char-list-<CoreABNF> (CharLex->CoreABNF char-list-<CharLex>)) (import-instance (<CoreABNF> char-list-<CoreABNF>))
Terminal values and core rules
The following procedures are provided as fields in the <CoreABNF> typeclass:
- char CHARprocedure
Procedure char builds a pattern matcher function that matches a single character.
- lit STRINGprocedure
lit matches a literal string (case-insensitive).
The following primitive parsers match the rules described in RFC 4234, Section 6.1.
- alpha STREAM-LISTprocedure
Matches any character of the alphabet.
- binary STREAM-LISTprocedure
Matches [0..1].
- decimal STREAM-LISTprocedure
Matches [0..9].
- hexadecimal STREAM-LISTprocedure
Matches [0..9] and [A..F,a..f].
- ascii-char STREAM-LISTprocedure
Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).
- cr STREAM-LISTprocedure
Matches the carriage return character.
- lf STREAM-LISTprocedure
Matches the line feed character.
- crlf STREAM-LISTprocedure
Matches the Internet newline.
- ctl STREAM-LISTprocedure
Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].
- dquote STREAM-LISTprocedure
Matches the double quote character.
- htab STREAM-LISTprocedure
Matches the tab character.
- lwsp STREAM-LISTprocedure
Matches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.
- sp STREAM-LISTprocedure
Matches the space character.
- vspace STREAM-LISTprocedure
Matches any printable ASCII character. That is, any character in the decimal range of [33..126].
- wsp STREAM-LISTprocedure
Matches space or tab.
- quoted-pair STREAM-LISTprocedure
Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.
- quoted-string STREAM-LISTprocedure
Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.
The following additional procedures are provided for convenience:
- set CHAR-SETprocedure
Matches any character from an SRFI-14 character set.
- set-from-string STRINGprocedure
Matches any character from a set defined as a string.
Operators
- concatenation MATCHER-LISTprocedure
concatenation matches an ordered list of rules. (RFC 4234, Section 3.1)
- alternatives MATCHER-LISTprocedure
alternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)
- range C1 C2procedure
range matches a range of characters. (RFC 4234, Section 3.4)
- variable-repetition MIN MAX MATCHERprocedure
variable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)
- repetition MATCHERprocedure
repetition matches zero or more consecutive elements that match the given rule.
- repetition1 MATCHERprocedure
repetition1 matches one or more consecutive elements that match the given rule.
- repetition-n N MATCHERprocedure
repetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)
- optional-sequence MATCHERprocedure
optional-sequence matches the given optional rule. (RFC 4234, Section 3.8)
- passprocedure
This matcher returns without consuming any input.
- bind F Pprocedure
Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.
Note: this combinator will signal failure if the input stream is empty.
- drop-consumed Pprocedure
Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.
Abbreviated syntax
abnf supports the following abbreviations for commonly used combinators:
- ::
- concatenation
- :?
- optional-sequence
- :!
- drop-consumed
- :s
- lit
- :c
- char
- :*
- repetition
- :+
- repetition1
Examples
The following parser libraries have been implemented with abnf, in order of complexity:
Parsing date and time
(require-extension typeclass input-classes abnf) (define char-list-<Input> (make-<Input> null? car cdr)) (define char-list-<Token> (Input->Token char-list-<Input>)) (define char-list-<CharLex> (Token->CharLex char-list-<Token>)) (define char-list-<CoreABNF> (CharLex->CoreABNF char-list-<CharLex>)) (import-instance (<Token> char-list-<Token> char-list/) (<CharLex> char-list-<CharLex> char-list/) (<CoreABNF> char-list-<CoreABNF> char-list/) ) (define fws (concatenation (optional-sequence (concatenation (repetition char-list/wsp) (drop-consumed (alternatives char-list/crlf char-list/lf char-list/cr)))) (repetition1 char-list/wsp))) (define (between-fws p) (concatenation (drop-consumed (optional-sequence fws)) p (drop-consumed (optional-sequence fws)))) ;; Date and Time Specification from RFC 5322 (Internet Message Format) ;; The following abnf parser combinators parse a date and time ;; specification of the form ;; ;; Thu, 19 Dec 2002 20:35:46 +0200 ;; ; where the weekday specification is optional. ;; Match the abbreviated weekday names (define day-name (alternatives (char-list/lit "Mon") (char-list/lit "Tue") (char-list/lit "Wed") (char-list/lit "Thu") (char-list/lit "Fri") (char-list/lit "Sat") (char-list/lit "Sun"))) ;; Match a day-name, optionally wrapped in folding whitespace (define day-of-week (between-fws day-name)) ;; Match a four digit decimal number (define year (between-fws (repetition-n 4 char-list/decimal))) ;; Match the abbreviated month names (define month-name (alternatives (char-list/lit "Jan") (char-list/lit "Feb") (char-list/lit "Mar") (char-list/lit "Apr") (char-list/lit "May") (char-list/lit "Jun") (char-list/lit "Jul") (char-list/lit "Aug") (char-list/lit "Sep") (char-list/lit "Oct") (char-list/lit "Nov") (char-list/lit "Dec"))) ;; Match a month-name, optionally wrapped in folding whitespace (define month (between-fws month-name)) ;; Match a one or two digit number (define day (concatenation (drop-consumed (optional-sequence fws)) (alternatives (variable-repetition 1 2 char-list/decimal) (drop-consumed fws)))) ;; Match a date of the form dd:mm:yyyy (define date (concatenation day month year)) ;; Match a two-digit number (define hour (repetition-n 2 char-list/decimal)) (define minute (repetition-n 2 char-list/decimal)) (define isecond (repetition-n 2 char-list/decimal)) ;; Match a time-of-day specification of hh:mm or hh:mm:ss. (define time-of-day (concatenation hour (drop-consumed (char-list/char #\:)) minute (optional-sequence (concatenation (drop-consumed (char-list/char #\:)) isecond)))) ;; Match a timezone specification of the form ;; +hhmm or -hhmm (define zone (concatenation (drop-consumed fws) (alternatives (char-list/char #\-) (char-list/char #\+)) hour minute)) ;; Match a time-of-day specification followed by a zone. (define itime (concatenation time-of-day zone)) (define date-time (concatenation (optional-sequence (concatenation day-of-week (drop-consumed (char-list/char #\,)))) date itime (drop-consumed (optional-sequence fws)))) (define (err s) (print "lexical error on stream: " s) `(error)) (require-extension lexgen) (print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))
Requires
Version History
- 7.0 Added bind* variant of bind [thanks to Peter Bex]
- 6.0 Using utf8 for char operations
- 5.1 Improvements to the CharLex->CoreABNF constructor
- 5.0 Synchronized with lexgen 5
- 3.2 Removed invalid identifier :|
- 3.0 Implemented typeclass interface
- 2.9 Bug fix in consumed-objects (reported by Peter Bex)
- 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
- 2.6 Bug fixes in consumer procedures
- 2.5 Removed procedure memo
- 2.4 Moved the definition of bind and drop to lexgen
- 2.2 Added pass combinator
- 2.1 Added procedure variable-repetition
- 2.0 Updated to match the interface of lexgen 2.0
- 1.3 Fix in drop
- 1.2 Added procedures bind drop consume collect
- 1.1 Added procedures set and set-from-string
- 1.0 Initial release
License
Copyright 2009-2015 Ivan Raikov
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.