chickadee » libxml2

libxml2

Libxml2 is a XML C parser and toolkit with DOM, SAX and text-reader APIs.

TOC »

LibXML2

Libxml2 is the XML C parser and toolkit developed for the Gnome project but usable outside of the Gnome platform), it is free software available under the MIT License. XML itself is a metalanguage to design markup languages, i.e. text language where semantic and structure are added to the content using extra 'markup' information enclosed between angle brackets. HTML is the most well-known markup language. Though the library is written in C a variety of language bindings make it available in other environments.

Author

David Ireland (djireland79 at gmail dot com)

Upstream

http://xmlsoft.org/

Egg Source Code

https://gitlab.com/maxwell79/chicken-libxml2

libxml

[module] libxml

Miscellaneous

attributes->string

attributes->string attributesprocedure

Converts an attribute list to string

attributes
List of attributes
Examples

Example:

(attributes->string `(("id1" . "value1") ("id2" . "value2")))
 => " id2=\"value2\" id1=\"value1\""

DOM Parser

DOM stands for the Document Object Model; this is an API for accessing XML or HTML structured documents.

Example

(define (dom-demo)
  (define (print-element-names node)
    (let loop ((n node))
      (when n
            (when (dom:is-element-node? n)
                  (print "element <" (dom:node-name n) ">")
                  (print "@ => " (dom:attributes n)))
            (when (dom:is-text-node? n)
                  (print "content => " (dom:node-content n)))
            (print-element-names (dom:node-children n))
            (loop (dom:next-node n)))))
  (define ctx (dom:make-parser-context))
  (define doc (dom:read-file-with-context ctx "foo.xml" #f 0))
  (define root (dom:root-element doc))
  (define valid? (dom:is-valid? ctx))
  (print "XML is valid?: " valid?)
  (print "root: " root)
  (print-element-names root)
  (dom:free-doc doc)
  (dom:cleanup-parser))

Node Types

dom:element-node
dom:element-nodeconstant

DOM element node

dom:attribute-node
dom:attribute-nodeconstant

DOM attribute node

dom:text-node
dom:text-nodeconstant

DOM text node

dom:cdata_section_node
dom:cdata_section_nodeconstant

DOM CData node

dom:entity-ref-node
dom:entity-ref-nodeconstant

DOM Entity reference node

dom:entity-node
dom:entity-nodeconstant

DOM entity node

dom:pi-node
dom:pi-nodeconstant

DOM pi-node

dom:comment-node
dom:comment-nodeconstant

DOM comment node

dom:document-node
dom:document-nodeconstant

DOM document node

dom:document-type-node
dom:document-type-nodeconstant

DOM document type node

dom:document-frag-node
dom:document-frag-nodeconstant

DOM document frag node

dom:notation-node
dom:notation-nodeconstant

DOM notation node

dom:html-document-node
dom:html-document-nodeconstant

DOM HTML document node

dom:dtd-node
dom:dtd-nodeconstant

DOM DTD node

dom:element-decl
dom:element-declconstant

DOM element declaration

dom:attribute-decl
dom:attribute-declconstant

DOM attributte declaration

dom:entity-decl
dom:entity-declconstant

DOM entity declaration

dom:namespace-decl
dom:namespace-declconstant

DOM namespace declaration

dom:xinclude-start
dom:xinclude-startconstant

DOM xinclude start declaration

dom:xinclude-end
dom:xinclude-endconstant

DOM xinlude end declaration

API

dom:is-element-node?
dom:is-element-node? nodeprocedure

Checks if specified dom:node is a element node

node
A dom:xml-node
dom:is-text-node?
dom:is-text-node? nodeprocedure

Checks if specified dom:node is a text node

node
A dom:xml-node
dom:is-attribute-node?
dom:is-attribute-node? nodeprocedure

Checks if specified dom:node is an attribute node

node
A dom:xml-node
dom:parse-string
dom:parse-string xml-string xml-size URL encoding optionsprocedure

Parse string using the DOM parser API

xml-string
XML string
xml-size
Size of the XML string
URL
XML URL
encoding
Encoding
options
Options
dom:parse-string-default
dom:parse-string-default strprocedure

Parse string using the DOM parser API with default options and encoding

xml-string
XML string
dom:cleanup-parser
dom:cleanup-parserconstant

Free the dom:doc

dom:parse-file
dom:parse-file filenameprocedure

Parse a file using the DOM parser API

filename
XML file
dom:free-doc
dom:free-docprocedure

Free the dom:doc

dom:make-parser-context
dom:make-parser-contextprocedure

Create a DOM parser context

dom:read-file-with-context
dom:read-file-with-context context filename encoding optionsprocedure

Parse a XML file using the given DOM parser context

context
DOM parser context
filename
encoding
options
dom:is-valid?
dom:is-valid? contextprocedure

Checks if the parser context is valid after parsing a file

context
DOM parser context
dom:free-parser-context
dom:free-parser-contextprocedure

Free the dom:parser-context

dom:to-string
dom:to-stringprocedure

Convert a dom:node to string including the children nodes

dom:next-node
dom:next-nodeprocedure

Move to the next dom:node

dom:node-content
dom:node-contentprocedure

Returns the contents (text) of the dom:node

dom:node-children
dom:node-childrenprocedure

Returns the first child node

dom:node-name
dom:node-nameprocedure

Returns the name of the dom:node

dom:is-element-name?
dom:is-element-name? name dom:nodeprocedure

Checks if the current name of the dom:node matches the specified string

name
Name (string) to match
dom:node
dom:get-attribute
dom:get-attribute key dom:nodeprocedure

Returns the attribute from the specified key

key
string
dom:node
dom:attributes
dom:attributes nprocedure

Returns the complete set of XML attributes for the given node

dom:node

SAX Parser

Sometimes the DOM tree output is just too large to fit reasonably into memory. In that case (and if you don't expect to save back the XML document loaded using libxml), it's better to use the SAX interface of libxml. SAX is a callback-based interface to the parser. Before parsing, the application layer registers a customized set of callbacks which are called by the library as it progresses through the XML input.

Example

(define (sax-demo)
  (define sax
    (sax:make-handler
      (lambda (localname attribute-list)
        (print "<" localname ">")
        (print "@ => " attribute-list))
      (lambda (localname) (print "<" localname "/>"))
      (lambda (characters) (print "[on-chars]: characters: " characters))))
  (sax:parse-file sax #f "foo.xml")
  (sax:free-handler sax))
sax:parse-file
sax:parse-file handler user-dataprocedure

Parse a XML file using the SAX handler

handler
SAX handler
user-data
SAX parser context
sax:parse-string
sax:parse-string sax-handler user-data xml-string sizeprocedure

Parse a XML string using the SAX handler

sax-handler
user-data
SAX parser context
xml-string
size
The size of the XML string
sax:make-handler
sax:make-handler on-start on-end on-charactersprocedure

Makes a SAX handler

on-start
λ called on start of element
on-end
λ called on end of element
on-characters
λ called on start of reading characters
sax:free-handler
sax:free-handler sax-handlerprocedure

Frees the SAX handler

sax-handler

Text Reader Parser

Libxml2 main API is tree based, where the parsing operation results in a document loaded completely in memory, and expose it as a tree of nodes all availble at the same time. This is very simple and quite powerful, but has the major limitation that the size of the document that can be handled is limited by the size of the memory available. Libxml2 also provide a SAX based API, but that version was designed upon one of the early expat version of SAX, SAX is also not formally defined for C. SAX basically work by registering callbacks which are called directly by the parser as it progresses through the document streams. The problem is that this programming model is relatively complex, not well standardized, cannot provide validation directly, makes entity, namespace and base processing relatively hard.

The text-reader API provides a far simpler programming model. The API acts as a cursor going forward on the document stream and stopping at each node in the way. The user's code keeps control of the progress and simply calls a read-next procedure repeatedly to progress to each node in sequence in document order. There is direct support for namespaces, xml:base, entity handling and adding DTD validation on top of it was relatively simple. This API is really close to the DOM Core specification This provides a far more standard, easy to use and powerful API than the existing SAX. Moreover integrating extension features based on the tree seems relatively easy.

In a nutshell the text-reader API provides a simpler, more standard and more extensible interface to handle large documents than the existing SAX version.

Example

(define (text-reader-demo)
  (define tr (text-reader:make "foo.xml"))
  (define (helper tr)
    (when (text-reader:element-node? tr)
          (print "<" (text-reader:name tr) ">")
          (print "@ => " (text-reader:all-attributes tr)))
    (when (text-reader:text-node? tr)
          (print "value =>" (text-reader:value tr)))
    (if (> (text-reader:read-more tr) 0) (helper tr)))
  (helper tr)
  (text-reader:free tr))

Node Types

text-reader:none
text-reader:noneconstant

Text-Reader none

text-reader:element
text-reader:elementconstant

Text-Reader element

text-reader:attribute
text-reader:attributeconstant

Text-Reader attribute

text-reader:text
text-reader:textconstant

Text-Reader text

text-reader:cdata
text-reader:cdataconstant

Text-Reader cdata

text-reader:entity-reference
text-reader:entity-referenceconstant

Text-Reader entity reference

text-reader:entity
text-reader:entityconstant

Text-Reader entity

text-reader:processing-instruction
text-reader:processing-instructionconstant

Text-Reader processing instruction

text-reader:comment
text-reader:commentconstant

Text-Reader comment

text-reader:document
text-reader:documentconstant

Text-Reader document

text-reader:document-type
text-reader:document-typeconstant

Text-Reader document type

text-reader:document-fragmenta
text-reader:document-fragmentaconstant

Text-Reader document fragments

text-reader:notation
text-reader:notationconstant

Text-Reader notation

text-reader:whitespace
text-reader:whitespaceconstant

Text-Reader whitespace

text-reader:significant-whitespace
text-reader:significant-whitespaceconstant

Text-Reader signficiant whitespace

text-reader:end-element
text-reader:end-elementconstant

Text-Reader element end

text-reader:end-entity
text-reader:end-entityconstant

Text-Reader entity end

text-reader:xml-declaration
text-reader:xml-declarationconstant

Text-Reader XML declaration

API

text-reader:element-to-string
text-reader:element-to-string rprocedure

Converts a text reader to string including child nodes

text-reader
text-reader:end-element-is?
text-reader:end-element-is? name readerprocedure

Checks if end element is specified name

name
Element name (string)
text-reader
text-reader:start-element-is?
text-reader:start-element-is? name readerprocedure

Checks if start element is specified name

name
Element name (string)
text-reader
text-reader:end-element-node?
text-reader:end-element-node? readerprocedure

Checks if node is an end element

reader
text-reader:text-node?
text-reader:text-node? readerprocedure

Checks for text node

reader
text-reader:element-node?
text-reader:element-node? readerprocedure

Checks if node is an element

reader
text-reader:make
text-reader:make filenameprocedure

Makes a new text-reader

filename
text-reader:read-more
text-reader:read-more text-readerprocedure

Reads the next node in the text-reader

text-reader
text-reader:free
text-reader:free text-readerprocedure

Free the specfied text-reader

text-reader
text-reader:node-type
text-reader:node-type text-readerprocedure

Returns the node type

text-reader
text-reader:empty-element?
text-reader:empty-element? text-readerprocedure

Checks if text-reader is empty

text-reader
text-reader:move-to-attribute
text-reader:move-to-attribute text-reader attribute-nameprocedure

Moves text-reader to the specified attribute

text-reader
attribute-name
(string)
text-reader:all-attributes
text-reader:all-attributes rprocedure

Extracts all the attributes from the element. Attributes are placed into an association list

text-reader
text-reader:move-to-next-attribute
text-reader:move-to-next-attribute text-readerprocedure

Moves text-reader to the next attribute

text-reader
text-reader:move-to-first-attribute
text-reader:move-to-first-attribute text-readerprocedure

Moves text-reader to the first attribute

text-reader
text-reader:move-to-element
text-reader:move-to-element text-readerprocedure

Moves text-reader to first element

text-reader
text-reader:next
text-reader:next text-readerprocedure

Moves text-reader to next node

text-reader
text-reader:next-sibling
text-reader:next-sibling text-readerprocedure

Moves text-reader to next sibling node

text-reader
text-reader:name
text-reader:name text-readerprocedure

Returns the name of the node

text-reader
text-reader:value
text-reader:value text-readerprocedure

Returns the value of the node

text-reader

About this egg

Author

David Ireland

Colophon

Documented by hahn.

Contents »