TOC »
abnf
Description
abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).
Library Procedures
The combinator procedures in this library are based on the interface provided by the lexgen library.
Terminal values and core rules
- char CHARprocedure
Procedure char builds a pattern matcher function that matches a single character.
- lit STRINGprocedure
lit matches a literal string (case-insensitive).
The following primitive parsers match the rules described in RFC 4234, Section 6.1.
- alpha STREAM-LISTprocedure
Matches any character of the alphabet.
- binary STREAM-LISTprocedure
Matches [0..1].
- decimal STREAM-LISTprocedure
Matches [0..9].
- hexadecimal STREAM-LISTprocedure
Matches [0..9] and [A..F,a..f].
- ascii-char STREAM-LISTprocedure
Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).
- cr STREAM-LISTprocedure
Matches the carriage return character.
- lf STREAM-LISTprocedure
Matches the line feed character.
- crlf STREAM-LISTprocedure
Matches the Internet newline.
- ctl STREAM-LISTprocedure
Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].
- dquote STREAM-LISTprocedure
Matches the double quote character.
- htab STREAM-LISTprocedure
Matches the tab character.
- lwsp STREAM-LISTprocedure
Matches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.
- sp STREAM-LISTprocedure
Matches the space character.
- vspace STREAM-LISTprocedure
Matches any printable ASCII character. That is, any character in the decimal range of [33..126].
- wsp STREAM-LISTprocedure
Matches space or tab.
- quoted-pair STREAM-LISTprocedure
Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.
- quoted-string STREAM-LISTprocedure
Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.
The following additional procedures are provided for convenience:
- set CHAR-SETprocedure
Matches any character from an SRFI-14 character set.
- set-from-string STRINGprocedure
Matches any character from a set defined as a string.
Operators
- concatenation MATCHER-LISTprocedure
concatenation matches an ordered list of rules. (RFC 4234, Section 3.1)
- alternatives MATCHER-LISTprocedure
alternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)
- range C1 C2procedure
range matches a range of characters. (RFC 4234, Section 3.4)
- variable-repetition MIN MAX MATCHERprocedure
variable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)
- repetition MATCHERprocedure
repetition matches zero or more consecutive elements that match the given rule.
- repetition1 MATCHERprocedure
repetition1 matches one or more consecutive elements that match the given rule.
- repetition-n N MATCHERprocedure
repetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)
- optional-sequence MATCHERprocedure
optional-sequence matches the given optional rule. (RFC 4234, Section 3.8)
- passprocedure
This matcher returns without consuming any input.
- bind F Pprocedure
Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.
Note: this combinator will signal failure if the input stream is empty.
- drop-consumed Pprocedure
Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.
Abbreviated syntax
abnf supports the following abbreviations for commonly used combinators:
- ::
- concatenation
- :?
- optional-sequence
- :!
- drop-consumed
- :s
- lit
- :c
- char
- :*
- repetition
- :+
- repetition1
Examples
The following parser libraries have been implemented with abnf, in order of complexity:
Parsing date and time
(import abnf) (define fws (concatenation (optional-sequence (concatenation (repetition wsp) (drop-consumed (alternatives crlf lf cr)))) (repetition1 wsp))) (define (between-fws p) (concatenation (drop-consumed (optional-sequence fws)) p (drop-consumed (optional-sequence fws)))) ;; Date and Time Specification from RFC 5322 (Internet Message Format) ;; The following abnf parser combinators parse a date and time ;; specification of the form ;; ;; Thu, 19 Dec 2002 20:35:46 +0200 ;; ; where the weekday specification is optional. ;; Match the abbreviated weekday names (define day-name (alternatives (lit "Mon") (lit "Tue") (lit "Wed") (lit "Thu") (lit "Fri") (lit "Sat") (lit "Sun"))) ;; Match a day-name, optionally wrapped in folding whitespace (define day-of-week (between-fws day-name)) ;; Match a four digit decimal number (define year (between-fws (repetition-n 4 decimal))) ;; Match the abbreviated month names (define month-name (alternatives (lit "Jan") (lit "Feb") (lit "Mar") (lit "Apr") (lit "May") (lit "Jun") (lit "Jul") (lit "Aug") (lit "Sep") (lit "Oct") (lit "Nov") (lit "Dec"))) ;; Match a month-name, optionally wrapped in folding whitespace (define month (between-fws month-name)) ;; Match a one or two digit number (define day (concatenation (drop-consumed (optional-sequence fws)) (alternatives (variable-repetition 1 2 decimal) (drop-consumed fws)))) ;; Match a date of the form dd:mm:yyyy (define date (concatenation day month year)) ;; Match a two-digit number (define hour (repetition-n 2 decimal)) (define minute (repetition-n 2 decimal)) (define isecond (repetition-n 2 decimal)) ;; Match a time-of-day specification of hh:mm or hh:mm:ss. (define time-of-day (concatenation hour (drop-consumed (char #\:)) minute (optional-sequence (concatenation (drop-consumed (char #\:)) isecond)))) ;; Match a timezone specification of the form ;; +hhmm or -hhmm (define zone (concatenation (drop-consumed fws) (alternatives (char #\-) (char #\+)) hour minute)) ;; Match a time-of-day specification followed by a zone. (define itime (concatenation time-of-day zone)) (define date-time (concatenation (optional-sequence (concatenation day-of-week (drop-consumed (char #\,)))) date itime (drop-consumed (optional-sequence fws)))) (define (err s) (print "lexical error on stream: " s) `(error)) (import lexgen) (print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))
Repository
https://github.com/iraikov/chicken-abnf
Version History
- 8.0 Ported to CHICKEN 5 and yasos collections interface
- 7.0 Added bind* variant of bind [thanks to Peter Bex]
- 6.0 Using utf8 for char operations
- 5.1 Improvements to the CharLex->CoreABNF constructor
- 5.0 Synchronized with lexgen 5
- 3.2 Removed invalid identifier :|
- 3.0 Implemented typeclass interface
- 2.9 Bug fix in consumed-objects (reported by Peter Bex)
- 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
- 2.6 Bug fixes in consumer procedures
- 2.5 Removed procedure memo
- 2.4 Moved the definition of bind and drop to lexgen
- 2.2 Added pass combinator
- 2.1 Added procedure variable-repetition
- 2.0 Updated to match the interface of lexgen 2.0
- 1.3 Fix in drop
- 1.2 Added procedures bind drop consume collect
- 1.1 Added procedures set and set-from-string
- 1.0 Initial release
License
Copyright 2009-2018 Ivan Raikov
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.