chickadee » expat

expat

Description

An interface to James Clarks' Expat XML parser.

Author

felix winkelmann; ported to Chicken 4 by Shawn Rutledge

Requirements

Documentation

Expat is a stream-oriented parser. You register callback (or handler) functions with the parser and then start feeding it the document. As the parser recognizes parts of the document, it will call the appropriate handler for that part (if you've registered one.) The document is fed to the parser in pieces, so you can start parsing before you have all the document. This also allows you to parse really huge documents that won't fit into memory.

If you want to parse an entire document into memory or if you need more bells and whistles, you should take a look at Oleg Kiselyov's SSAX parser.

expat:make-parser

expat:make-parser #!key (encoding #f) (namespaces #f) (namespace-separator :)procedure

Creates a parser object with the specified attributes. encoding should be a string designating the encoding of the document and should be one of the following:

  • UTF-8
  • UTF-16
  • ISO-8859-1
  • US-ASCII

If no encoding or #f is given, then the encoding specified in the document. Note that the strings passed to the handlers are always UTF-8 encoded.

If namespaces is true, then namespace declarations are properly recognized and tags belonging to a namespace will be prefixed with the namespace string and the character given in namespace-separator.

expat:make-external-entity-parser

expat:make-external-entity-parser PARSER CONTEXT #!key (encoding #f)procedure

Creates a parser to recursively process external entities.

expat:destroy-parser

expat:destroy-parser PARSERprocedure

Releases the memory resources associated with PARSER.

expat:parse

expat:parse PARSER STRING #!key length (final #t) (external-entities #f)procedure

Parses a piece of XML document given in STRING. If length is given, then it specifies the number of bytes to parse. If final is true, then the string is the last piece of the document. LENGTH defaults to (string-length STRING).

Returns #t on success, or triggers and exception of the kinds (exn expat). If external-entities controls whether parsing of external entities is enabled and can be any of the symbols never, always or unless-standalone. #f and #t are synonymous for never and always.

expat:set-start-handler!

expat:set-start-handler! PARSER PROCEDUREprocedure

Sets the handler to process start (and empty) tags. PROCEDURE will be called with two arguments: the tag (a string) and a list of pairs, where each pair is of the form (ATTRIBUTENAME . ATTRIBUTEVALUE) (both strings).

expat:set-end-handler!

expat:set-end-handler! PARSER PROCEDUREprocedure

Sets the handler to process end (and empty) tags. PROCEDURE will be called with one argumente the tag (a string).

expat:set-character-data-handler!

expat:set-character-data-handler! PARSER PROCEDUREprocedure

Sets the handler to process text. PROCEDURE will be called with one argument: a string containing a piece of text. Note that a single block of contiguous text free of markup may still result in a sequence of calls to this handler.

expat:set-processing-instruction-handler!

expat:set-processing-instruction-handler! PARSER PROCEDUREprocedure

Sets the handler to for processing insructions. PROCEDURE will be called with two arguments: target and data (both strings). The target is the first word in the processing instruction. The data is the rest of the characters in it after skipping all whitespace after the initial word.

expat:set-comment-handler!

expat:set-comment-handler! PARSER PROCEDUREprocedure

Sets the handler to process comments. PROCEDURE will be called with the all the text inside the comment delimiters.

expat:set-external-entity-ref-handler!

expat:set-external-entity-ref-handler! PARSER PROCEDUREprocedure

Sets the handler for references to external entities. PROCEDURE will be called with five arguments: parser, context, URI base, system- and public ID. The first argument is an expat:parser record, and the rest are strings. To parse the external entity, create a parser with expat:make-external-entity-parser.

Examples

A silly example:

(use expat)

(define text #<<EOF
<?xml version='1.0'?>
<!-- a comment -->
<?pi1 yepyepyep?>
<yo:this yo='abc' xmlns:yo="http://www.yo.com">
&gt;&;lt;&#x100;
<yo:test>yes, no, &#33<is/><a/>
</yo:test>some more text
</yo:this>
EOF
)

(define p (expat:make-parser namespaces: #t))
(expat:set-start-handler! p (lambda (tag attrs) (print "Start: " tag " - " attrs)))
(expat:set-end-handler! p (lambda (tag) (print "End: " tag)))
(expat:set-character-data-handler! p (lambda (text) (pp (string->list text))))
(expat:set-processing-instruction-handler! p (lambda (target text) (print "PI: " target " - " text)))
(expat:set-comment-handler! p (lambda (text) (print "Comment: " text)))
(expat:parse p text)
(expat:destroy-parser p)

This will output:

 Comment:  a comment 
 PI: pi1 - yepyepyep
 Start: http://www.yo.com:this - ((yo . abc))
 (#\newline)
 (#\>)
 (#\<)
 (#\Ä #\)
 (#\newline)
 (#\space)
 Start: http://www.yo.com:test - ()
 (#\y #\e #\s #\, #\space #\n #\o #\, #\space)
 (#\!)
 Start: is - ()
 End: is
 Start: a - ()
 End: a
 (#\newline)
 (#\space)
 End: http://www.yo.com:test
 (#\s #\o #\m #\e #\space #\m #\o #\r #\e #\space #\t #\e #\x #\t)
 (#\newline)
 End: http://www.yo.com:this

Another example that uses DTDs:

Say we have a file foo.xml:

 <?xml version="1.0"?>
 <!DOCTYPE foo SYSTEM "foo.dtd">
 <foo>
 &abcdef;
 </foo>

and another one called foo.dtd:

 <!ENTITY abcdef "this is a test">
(use utils expat)

(define p (expat:make-parser))
(expat:set-start-handler! p (lambda (tag attrs) (print "Start: " tag " - " attrs)))
(expat:set-end-handler! p (lambda (tag) (print "End: " tag)))
(expat:set-character-data-handler! p (lambda (text) (pp (string->list text))))

(expat:set-external-entity-ref-handler!
 p
 (lambda (context base sys pub)
   (print "external: " sys)
   (let* ([p2 (expat:make-external-entity-parser p context)]
	  [s (expat:parse p2 (read-all "foo.dtd"))] )
     (expat:destroy-parser p2) 
     s) ) )

(expat:parse p (read-all "foo.xml") external-entities: #t)
(expat:destroy-parser p)

Changelog

License

 Copyright (c) 2005, Felix L. Winkelmann
 All rights reserved.
 
 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following
 conditions are met:
 
   Redistributions of source code must retain the above copyright notice, this list of conditions and the following
     disclaimer. 
   Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
     disclaimer in the documentation and/or other materials provided with the distribution. 
   Neither the name of the author nor the names of its contributors may be used to endorse or promote
     products derived from this software without specific prior written permission. 
 
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS
 OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
 AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR
 CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.

Contents »