chickadee » abnf

abnf

Description

abnf is a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (RFC 4234).

Library Procedures

The combinator procedures in this library are based on the interface provided by the lexgen library.

<CoreABNF> typeclass

The procedures of this library are provided as fields of the <CoreABNF> typeclass. Please see the typeclass library for information on type classes.

The <CoreABNF> class is intended to provide abstraction over different kinds of input sequences, e.g. character lists, strings, streams, etc. The following example illustrates the creation of an instance of <CoreABNF> specialized for character lists. This code is also provided as the abnf-charlist egg, which is fully compatible with abnf prior to version 3.0.

(require-extension typeclass input-classes abnf)

(define char-list-<Input>
  (make-<Input> null? car cdr))

(define char-list-<Token>
  (Input->Token char-list-<Input>))

(define char-list-<CharLex>
  (Token->CharLex char-list-<Token>))

(define char-list-<CoreABNF>
  (CharLex->CoreABNF char-list-<CharLex>))

(import-instance (<CoreABNF> char-list-<CoreABNF>)
		 )

Terminal values and core rules

The following procedures are provided as fields in the <CoreABNF> typeclass:

char CHARprocedure

Procedure char builds a pattern matcher function that matches a single character.

lit STRINGprocedure

lit matches a literal string (case-insensitive).

The following primitive parsers match the rules described in RFC 4234, Section 6.1.

alpha STREAM-LISTprocedure

Matches any character of the alphabet.

binary STREAM-LISTprocedure

Matches [0..1].

decimal STREAM-LISTprocedure

Matches [0..9].

hexadecimal STREAM-LISTprocedure

Matches [0..9] and [A..F,a..f].

ascii-char STREAM-LISTprocedure

Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).

cr STREAM-LISTprocedure

Matches the carriage return character.

lf STREAM-LISTprocedure

Matches the line feed character.

crlf STREAM-LISTprocedure

Matches the Internet newline.

ctl STREAM-LISTprocedure

Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127].

dquote STREAM-LISTprocedure

Matches the double quote character.

htab STREAM-LISTprocedure

Matches the tab character.

lwsp STREAM-LISTprocedure

Matches linear white-space. That is, any number of consecutive wsp, optionally followed by a crlf and (at least) one more wsp.

sp STREAM-LISTprocedure

Matches the space character.

vspace STREAM-LISTprocedure

Matches any printable ASCII character. That is, any character in the decimal range of [33..126].

wsp STREAM-LISTprocedure

Matches space or tab.

quoted-pair STREAM-LISTprocedure

Matches a quoted pair. Any characters (excluding CR and LF) may be quoted.

quoted-string STREAM-LISTprocedure

Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all.

The following additional procedures are provided for convenience:

set CHAR-SETprocedure

Matches any character from an SRFI-14 character set.

set-from-string STRINGprocedure

Matches any character from a set defined as a string.

Operators

concatenation MATCHER-LISTprocedure

concatenation matches an ordered list of rules. (RFC 4234, Section 3.1)

alternatives MATCHER-LISTprocedure

alternatives matches any one of the given list of rules. (RFC 4234, Section 3.2)

range C1 C2procedure

range matches a range of characters. (RFC 4234, Section 3.4)

variable-repetition MIN MAX MATCHERprocedure

variable-repetition matches between MIN and MAX or more consecutive elements that match the given rule. (RFC 4234, Section 3.6)

repetition MATCHERprocedure

repetition matches zero or more consecutive elements that match the given rule.

repetition1 MATCHERprocedure

repetition1 matches one or more consecutive elements that match the given rule.

repetition-n N MATCHERprocedure

repetition-n matches exactly N consecutive occurences of the given rule. (RFC 4234, Section 3.7)

optional-sequence MATCHERprocedure

optional-sequence matches the given optional rule. (RFC 4234, Section 3.8)

passprocedure

This matcher returns without consuming any input.

bind F Pprocedure

Given a rule P and function F, returns a matcher that first applies P to the input stream, then applies F to the returned list of consumed tokens, and returns the result and the remainder of the input stream.

Note: this combinator will signal failure if the input stream is empty.

bind* F Pprocedure

The same as bind, but will signal success if the input stream is empty.

drop-consumed Pprocedure

Given a rule P, returns a matcher that always returns an empty list of consumed tokens when P succeeds.

Abbreviated syntax

abnf supports the following abbreviations for commonly used combinators:

::
concatenation
:?
optional-sequence
:!
drop-consumed
:s
lit
:c
char
:*
repetition
:+
repetition1

Examples

The following parser libraries have been implemented with abnf, in order of complexity:

Parsing date and time

(require-extension typeclass input-classes abnf)

(define char-list-<Input>
  (make-<Input> null? car cdr))

(define char-list-<Token>
  (Input->Token char-list-<Input>))

(define char-list-<CharLex>
  (Token->CharLex char-list-<Token>))

(define char-list-<CoreABNF>
  (CharLex->CoreABNF char-list-<CharLex>))

(import-instance (<Token> char-list-<Token> char-list/)
		 (<CharLex> char-list-<CharLex> char-list/)
                 (<CoreABNF> char-list-<CoreABNF> char-list/)
                 )

(define fws
  (concatenation
   (optional-sequence 
    (concatenation
     (repetition char-list/wsp)
     (drop-consumed 
      (alternatives char-list/crlf char-list/lf char-list/cr))))
   (repetition1 char-list/wsp)))


(define (between-fws p)
  (concatenation
   (drop-consumed (optional-sequence fws)) p 
   (drop-consumed (optional-sequence fws))))

;; Date and Time Specification from RFC 5322 (Internet Message Format)

;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;;   Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional. 
			     
;; Match the abbreviated weekday names

(define day-name 
  (alternatives
   (char-list/lit "Mon")
   (char-list/lit "Tue")
   (char-list/lit "Wed")
   (char-list/lit "Thu")
   (char-list/lit "Fri")
   (char-list/lit "Sat")
   (char-list/lit "Sun")))

;; Match a day-name, optionally wrapped in folding whitespace

(define day-of-week (between-fws day-name))


;; Match a four digit decimal number

(define year (between-fws (repetition-n 4 char-list/decimal)))

;; Match the abbreviated month names

(define month-name (alternatives
		    (char-list/lit "Jan")
		    (char-list/lit "Feb")
		    (char-list/lit "Mar")
		    (char-list/lit "Apr")
		    (char-list/lit "May")
		    (char-list/lit "Jun")
		    (char-list/lit "Jul")
		    (char-list/lit "Aug")
		    (char-list/lit "Sep")
		    (char-list/lit "Oct")
		    (char-list/lit "Nov")
		    (char-list/lit "Dec")))

;; Match a month-name, optionally wrapped in folding whitespace

(define month (between-fws month-name))


;; Match a one or two digit number

(define day (concatenation
	     (drop-consumed (optional-sequence fws))
	     (alternatives 
	      (variable-repetition 1 2 char-list/decimal)
	      (drop-consumed fws))))

;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))

;; Match a two-digit number 

(define hour      (repetition-n 2 char-list/decimal))
(define minute    (repetition-n 2 char-list/decimal))
(define isecond   (repetition-n 2 char-list/decimal))

;; Match a time-of-day specification of hh:mm or hh:mm:ss.

(define time-of-day (concatenation
		     hour (drop-consumed (char-list/char #\:))
		     minute (optional-sequence 
			     (concatenation (drop-consumed (char-list/char #\:))
 					 isecond))))

;; Match a timezone specification of the form
;; +hhmm or -hhmm 

(define zone (concatenation 
	      (drop-consumed fws)
	      (alternatives (char-list/char #\-) (char-list/char #\+))
	      hour minute))

;; Match a time-of-day specification followed by a zone.

(define itime (concatenation time-of-day zone))

(define date-time (concatenation
		   (optional-sequence
		    (concatenation
		     day-of-week
		     (drop-consumed (char-list/char #\,))))
		   date
		   itime
		   (drop-consumed (optional-sequence fws))))

(define (err s)
  (print "lexical error on stream: " s)
  `(error))

(require-extension lexgen)
(print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))

Requires

Version History

License

 Copyright 2009-2015 Ivan Raikov
 This program is free software: you can redistribute it and/or
 modify it under the terms of the GNU General Public License as
 published by the Free Software Foundation, either version 3 of the
 License, or (at your option) any later version.
 This program is distributed in the hope that it will be useful, but
 WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 General Public License for more details.
 A full copy of the GPL license can be found at
 <http://www.gnu.org/licenses/>.

Contents »