chickadee » sxpath

This is the sxpath extension library for Chicken Scheme.

Sxpath

The sxpath parts of the sxml-tools from the SSAX project at Sourceforge. This includes the DDO (Distinct Document Order) and context-based versions of sxpath, as well as txpath support.

Documentation

This egg provides the sxpath-related tools from the sxml-tools available in the SSAX/SXML Sourceforge project.

It is split up in several modules:

sxpath, context-sxpath, ddo-sxpath, txpath, xpath-parser, sxpath-lolevel, context-sxpath-lolevel and ddo-sxpath-lolevel.

The lolevel modules expose the full list of accessors, constructors and predicates used to manually traverse an SXML tree. The higher-level sxpath modules only expose a handful of procedures, which comprise the high-level interface you'd normally need to use.

Much documentation is available at Lisovsky's XML page and the SSAX homepage.

The initial documentation on this wiki page came straight from the comments in the extremely well-documented source code. It's recommended you read the code if you want to learn more.

If you're not familiar with regular xpath the sxpath documentation may be a bit confusing. Try this quick tutorial to get up to speed with xpath.

sxpath

This is the preferred interface to use. It allows you to query the SXML document tree using an s-expression based language, in which you can also use arbitrary procedures and even "classic" textual XPath (see below for docs on that).

A complete description on how to use this is outside the scope of this egg documentation. See the introduction to SXPath for that.

sxpath path #!optional ns-bindingprocedure

Returns a procedure that accepts an SXML document tree and an optional association list of variables and returns a nodeset (list of nodes) that match the path expression.

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings but only for textual XPath strings embedded in the path expression.

The optional association list of variables must include all the variables defined by the sxpath expression.

The path is translated to the full SXPath according to the following rewriting rules:

; (sxpath '()) -> (node-join)
; (sxpath '(path-component ...)) ->
;		(node-join (sxpath1 path-component) (sxpath '(...)))
; (sxpath1 '//) -> (sxml:descendant-or-self sxml:node?)
; (sxpath1 '(equal? x)) -> (select-kids (node-equal? x))
; (sxpath1 '(eq? x))    -> (select-kids (node-eq? x))
; (sxpath1 '(*or* ...))  -> (select-kids (ntype-names??
;                                          (cdr '(*or* ...))))
; (sxpath1 '(*not* ...)) -> (select-kids (sxml:complement 
;                                         (ntype-names??
;                                          (cdr '(*not* ...)))))
; (sxpath1 '(ns-id:* x)) -> (select-kids 
;                                      (ntype-namespace-id?? x))
; (sxpath1 ?symbol)     -> (select-kids (ntype?? ?symbol))
; (sxpath1 ?string)     -> (txpath ?string)
; (sxpath1 procedure)   -> procedure
; (sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...))
; (sxpath1 '(path reducer ...)) ->
;		(node-reduce (sxpath path) (sxpathr reducer) ...)
; (sxpathr number)      -> (node-pos number)
; (sxpathr path-filter) -> (filter (sxpath path-filter))

It can be useful to compare the following examples to those for txpath.

(import sxpath)

;; selects all the 'item' elements that have an 'olist' parent
;; (which is not root) and that are in the same document as the context node
((sxpath `(// olist item))
 '(doc (olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1") (item "3"))


(import sxpath-lolevel (chicken base))

;; selects only the nth 'item' element under each 'olist' parent
;; (which is not root) and that is in the same document as the context node
;; The n is parameterized to be the first item
;; The node-pos function comes from sxpath-lolevel and implements txpath position selector [$n]
((sxpath `(// olist ((item ,(lambda (nodeset var-binding) ((node-pos (alist-ref 'n var-binding)) nodeset))))))
 '(doc (olist (item "1") (item "2")) (nested (olist (item "3")))) '((n . 1)))
 => ((item "1") (item "3"))

;; selects the 'chapter' children of the context node that have one or
;; more 'title' children with string-value equal to 'Introduction'
((sxpath '((chapter ((equal? (title "Introduction"))))))
 '(text  (chapter (title "Introduction"))  (chapter "No title for this chapter")  (chapter (title "Conclusion"))))
 => ((chapter (title "Introduction")))

;; (sxpath string-expr) is equivalent to (txpath string-expr)
((sxpath "chapter[title='Introduction']")
 '(text  (chapter (title "Introduction"))  (chapter "No title for this chapter")  (chapter (title "Conclusion"))))
 => ((chapter (title "Introduction")))
if-sxpath pathprocedure

Like sxpath, only returns #f instead of the empty list if nothing matches (so it does not always return a nodeset).

car-sxpath pathprocedure

Like sxpath, only instead of a nodeset it returns the first node found. If no node was found, return an empty list.

if-car-sxpath pathprocedure

Like car-sxpath, only returns #f instead of the empty list if nothing matches.

sxml:id-alist node #!rest lpathsprocedure

Builds an index as a list of (ID_value . element) pairs for given node. lpaths are location paths for attributes of type ID (ie, sxpath expressions that tell it how to find the ID attribute).

Note: location paths must be of the form (expr '@ attrib-name).

See also sxml:lookup below, in sxpath-lolevel, which can use this index.

;; Obtain ID values for a regular XHTML DOM
(sxml:id-alist
 '(div (h1 (@ (id "info"))
           "A story")
       (p (@ (id "story-body"))
	  "Once upon a time")
       (a (@ (id "back") (href "../index.xml"))
	  "click here to go back"))
 '(* @ id))
 => (("info" h1 (@ (id "info")) "A story")
     ("story-body" p (@ (id "story-body")) "Once upon a time")
     ("back" a (@ (id "back") (href "../index.xml")) "click here to go back"))

;; In an alternate reality, where links are uniquely identified
;; by their href, we would use this
(sxml:id-alist
 '(div (h1 (@ (id "info"))
	   "A story")
       (p (@ (id "story-body"))
	  "Once upon a time")
       (a (@ (id "back") (href "../index.xml"))
	  "click here to go back"))
 '(h1 @ foo) '(a @ href))
 => (("../index.xml" . (a (@ (id "back")
                             (href "../index.xml"))
                          "click here to go back")))

txpath

This section documents the txpath interface. This interface is mostly useful for programs that deal exclusively with "legacy" textual XPath queries.

Primary interface

The following procedures are the main interface one would use in practice. There are also more low-level procedures (see next section), which one could use to build txpath extensions.

sxml:xpath string #!rest ns-bindingprocedure
txpath string #!rest ns-bindingprocedure
sxml:xpath+root string #!rest ns-bindingprocedure
sxml:xpath+root+vars string #!rest ns-bindingprocedure

Returns a procedure that accepts an SXML document tree and an optional association list of variable bindings and returns a nodeset (list of nodes) that match the XPath expression string.

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings.

(txpath x) is equivalent to (sxpath x) whenever x is a string. The txpath, sxml:xpath+root and sxml:xpath+root+vars procedures are currently all aliases for sxml:xpath, which exist for backwards compatibility reasons.

It's useful to compare the following examples to the above examples for sxpath.

(import txpath)

;; selects all the 'item' elements that have an 'olist' parent
;; (which is not root) and that are in the same document as the context node
((txpath "//olist/item")
 '(doc (olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1") (item "3"))

;; Same example as above, but now with a namespace prefix of 'x',
;; which is bound to the namespace "bar" in the ns-binding parameter.
((txpath "//x:olist/item" '((x . "bar")))
 '(doc (bar:olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1"))


;; selects only the nth 'item' element under each 'olist' parent
;; (which is not root) and that is in the same document as the context node
;; The n is parameterized to be the first item
((txpath "//olist/item[$n]")
 '(doc (olist (item "1") (item "2")) (nested (olist (item "3")))) '((n . 1)))
 => ((item "1") (item "3"))

;; selects the 'chapter' children of the context node that have one or
;; more 'title' children with string-value equal to 'Introduction'
((txpath "chapter[title='Introduction']")
 '(text  (chapter (title "Introduction"))  (chapter "No title for this chapter")  (chapter (title "Conclusion"))))
 => ((chapter (title "Introduction")))
sxml:xpath+index string #!rest ns-bindingprocedure

This procedure returns the result of sxml:xpath consed onto #t. If the sxml:xpath would return #f, this returns #f instead.

It is provided solely for backwards compatibility.

sxml:xpointer string #!rest ns-bindingprocedure
sxml:xpointer+root+vars string #!rest ns-bindingprocedure

Returns a procedure that accepts an SXML document tree and returns a nodeset (list of nodes) that match the XPointer expression string.

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings.

Currently, only the XPointer xmlns() and xpointer() schemes are implemented, the element() scheme is not.

;; selects all the 'item' elements that have an 'olist' parent
;; (which is not root) and that are in the same document as the context node.
;; Equivalent to (txpath "//olist/item").
((sxml:xpointer "xpointer(//olist/item)")
 '(doc (olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1") (item "3"))

;; An example with a namespace prefix, now using the XPointer xmlns()
;; function instead of the ns-binding parameter. xmlns always have full
;; namespace names on their right-hand side, never bound shortcuts.
((sxml:xpointer "xmlns(x=bar)xpointer(//x:olist/item)")
 '(doc (bar:olist (item "1")) (item "2") (nested (olist (item "3")))))
 => ((item "1"))
sxml:xpointer+index string #!rest ns-bindingprocedure

This procedure returns the result of sxml:xpointer consed onto #t. If the sxml:xpointer would return #f, this returns #f instead.

It is provided solely for backwards compatibility.

sxml:xpath-expr string #!rest ns-bindingprocedure

Returns a procedure that accepts an SXML node and returns #t if the node matches the string expression. This is an expression of type Expr, which is whatever you can put in a predicate (between square brackets after a node name).

The optional ns-binding argument is an alist of namespace bindings. It is used to map abbreviated namespace prefixes to full URI strings.

;; Does the node have a class attribute with "content" as value?
((sxml:xpath-expr "@class=\"content\"")
 '(div (@ (class "content")) (p "Lorem ipsum")))
 => #t

;; Does the node have a paragraph with string value of "Lorem ipsum"?
((sxml:xpath-expr "p=\"Lorem ipsum\"")
 '(div (@ (class "content")) (p "Lorem ipsum")))
 => #t

;; Does the node have a "p" child node with string value of "Blah"?
((sxml:xpath-expr "p=\"Blah\"")
 '(div (@ (class "content")) (p "Lorem ipsum")))
 => #f

XPath function library

The procedures documented in this section can be used to implement a custom xpath traverser. Unlike the sxpath low-level procedures, they are not in a separate library because they are in the same file as the high-level procedures, so the library size is not impacted by splitting them up. When importing the txpath module you can simply leave these procedures out, so splitting them up into a separate library would provide no benefits.

These procedures implement the core XPath functions, as described in The XPath specification, section 4.

All of the following procedures return procedures that accept 4 arguments, which together make up (part of) the XPath context:

 (lambda (nodeset root-node context var-binding) ...)

The nodeset argument is the nodeset (a list of nodes) that is currently under consideration. The root-node argument is a nodeset containing only one element: the root node of the document. The context argument is a list of two numbers; the position and size of the context. The var-binding argument is an alist of XPath variable bindings.

The arguments to each of these core procedures, if any, are all procedures of the same type as they return. For example, sxml:core-local-name accepts an optional procedure which accepts a nodeset, a root-node, a context, a var-binding and returns a nodeset. Of this nodeset, the local part of the name of the first node (if any) is returned. The values for each of these arguments are just those passed to sxml:core-local-name.

Node set functions

sxml:core-lastprocedure
sxml:core-positionprocedure
sxml:core-count node-setprocedure
sxml:core-id objectprocedure
sxml:core-local-name #!optional node-setprocedure
sxml:core-namespace-uri #!optional node-setprocedure
sxml:core-name #!optional node-setprocedure

String functions

sxml:core-string #!optional objectprocedure
(sxml:core-concat [string ...])procedure
sxml:core-starts-with string prefixprocedure
sxml:core-contains string substringprocedure
sxml:core-substring-before string separatorprocedure
sxml:core-substring-after string separatorprocedure
sxml:core-substring string numeric-offset #!optional lengthprocedure
sxml:core-string-length #!optional stringprocedure
sxml:core-normalize-space #!optional stringprocedure
sxml:core-translate string from toprocedure

Boolean functions

sxml:core-boolean objectprocedure
sxml:core-not booleanprocedure
sxml:core-trueprocedure
sxml:core-falseprocedure
sxml:core-lang lang-codeprocedure

Number functions

sxml:core-number #!optional objectprocedure
sxml:core-sum node-setprocedure
sxml:core-floor numberprocedure
sxml:core-ceiling numberprocedure
sxml:core-round numberprocedure

Parameter list

sxml:classic-paramsconstant

This is a very long list of parameters containing parser and traversal information for the textual xpath parser engine. This corresponds to the "function library" mentioned in the introduction of the XPath spec. You will have read the source code for details on how exactly to use it.

sxpath-lolevel

This section documents the low-level sxpath interface. It includes mostly-generic list and SXML operators. This is equivalent to the "low-level sxpath interface" described at the introduction to SXPath.

These utilities are useful when you want to query SXML document trees, but full sxpath would be overkill. Most of these procedures are faster than their sxpath equivalent, because they are very specific. But this also means they are very low-level, so you should use them only if you know what you're doing.

Predicates

sxml:empty-element? objprocedure

Predicate which returns #t if given element obj is empty. Empty elements have no nested elements, text nodes, PIs, Comments or entities but may contain attributes or namespace-id. It is a SXML counterpart of XML empty-element.

sxml:shallow-normalized? objprocedure

Returns #t if the given obj is a shallow-normalized SXML element. The element itself has to be normalised but its nested elements are not tested.

sxml:normalized? objprocedure

Returns #t if the given obj is a normalized SXML element. The element itself and all its nested elements have to be normalised.

sxml:shallow-minimized? objprocedure

Returns #t if the given obj is a shallow-minimized SXML element. The element itself has to be minimised but its nested elements are not tested.

sxml:minimized? objprocedure

Returns #t if the given obj is a minimized SXML element. The element itself and all its nested elements have to be minimised.

Accessors

These procedures obtain information about nodes, or their direct children. They don't traverse subtrees.

Normalization-independent accessors

These accessors can be used on arbitrary, non-normalized SXML trees. Because of this, they are generally slower than the normalization-dependent variants listed in the next section.

sxml:name nodeprocedure

Returns a name of a given SXML node. It is introduced for the sake of encapsulation.

sxml:element-name objprocedure

A checked version of sxml:name, which returns #f if the given obj is not a SXML element. Otherwise returns its name.

sxml:node-name objprocedure

Safe version of sxml:name, which returns #f if the given obj is not a SXML node. Otherwise returns its name.

The difference between this and sxml::element-name is that a node can be one of @, @@, *PI*, *COMMENT* or *ENTITY* while an element must be a real element (any symbol not in that set is considered to be an element).

sxml:ncname nodeprocedure

Like sxml:name, except returns only the local part of the name (called an "NCName" in the XML namespaces spec).

The node's name is interpreted as a "Qualified Name", a colon-separated name of which the last one is considered to be the local part. If the name contains no colons, the name itself is returned.

Important: Please note that while an SXML name is a symbol, this function returns a string.

sxml:name->ns-id sxml-nameprocedure

Given a node name, return the namespace part of the name (called a namespace-id). If the name contains no colons, returns #f. See sxml:ncname for more info.

Important: Please note that while an SXML name is a symbol, this function returns a string.

sxml:content objprocedure

Retrieve the contents of an SXML element or nodeset. Any non-element nodes (attributes, processing instructions, etc) are discarded, while the elements and text nodes are returned as a list of strings and nested elements in document order. This list is empty if obj is an empty element or empty list.

The inner elements are unmodified so they still contain attributes, but also comments or other non-element nodes.

(sxml:content
  '(div (@ (class "content"))
        (*COMMENT* "main contents start here")
         "The document moved "
	 (a (@ (href "/other.xml")) "here")))
 => ("The document moved " (a (@ (href "/other.xml")) "here"))
sxml:text nodeprocedure

Returns a string which combines all the character data from text node children of the given SXML element or "" if there are no text node children. Note that it does not include text from descendant nodes, only direct children.

(sxml:text
  '(div (@ (class "content"))
        (*COMMENT* "main contents start here")
         "The document moved "
	 (a (@ (href "/other.xml")) "here")))
 => ("The document moved ")

Normalization-dependent accessors

"Universal" accessors are less effective but may be used for non-normalized SXML. These safe accessors are named with suffix '-u' for "universal".

"Fast" accessors are optimized for normalized SXML data. They are not applicable to arbitrary non-normalized SXML data. Their names have no specific suffixes.

sxml:content-raw objprocedure

Returns all the content of normalized SXML element except attr-list and aux-list. Thus it includes PI, COMMENT and ENTITY nodes as well as TEXT and ELEMENT nodes returned by sxml:content. Returns a list of nodes in document order or empty list if obj is an empty element or an empty list.

This function is faster than sxml:content.

sxml:attr-list-u objprocedure

Returns the list of attributes for given element or nodeset. Analog of ((sxpath '(@ *)) obj). Empty list is returned if there is no list of attributes.

sxml:aux-list objprocedure
sxml:aux-list-u objprocedure

Returns the list of auxiliary nodes for given element or nodeset. Analog of ((sxpath '(@@ *)) obj). Empty list is returned if a list of auxiliary nodes is absent.

sxml:aux-node obj aux-nameprocedure

Return the first aux-node with <aux-name> given in SXML element obj or #f is such a node is absent.

NOTE: it returns just the first node found even if multiple nodes are present, so it's mostly intended for nodes with unique names. Use sxml:aux-nodes if you want all of them.

sxml:aux-nodes obj aux-nameprocedure

Return a list of aux-nodes with aux-name given in SXML element obj or '() if such a node is absent.

sxml:attr obj attr-nameprocedure

Returns the value of the attribute with name attr-name in the given SXML element obj, or #f if no such attribute exists.

sxml:attr-from-list attr-list nameprocedure

Returns the value of the attribute with name attr-name in the given list of attributes attr-list, or #f if no such attribute exists. The list of attributes can be obtained from an element using the sxml:attr-list procedure.

sxml:num-attr obj attr-nameprocedure

Returns the value of the numerical attribute with name attr-name in the given SXML element obj, or #f if no such attribute exists. This value is converted from a string to a number.

sxml:attr-u obj attr-nameprocedure

Accessor for an attribute attr-name of given SXML element obj, which may also be an attributes-list or a nodeset (usually content of an SXML element)

sxml:ns-list objprocedure

Returns the list of namespaces for given element. Analog of ((sxpath '(@@ *NAMESPACES* *)) obj). The empty list is returned if there are no namespaces.

sxml:ns-id->nodes obj namespace-idprocedure

Returns a list of namespace information lists that match the given namespace-id in SXML element obj. Analog of ((sxpath '(@@ *NAMESPACES* namespace-id)) obj). The empty list is returned if there is no namespace with the given namespace-id.

(sxml:ns-id->nodes
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")))) 'c)
 => ((c "http://www.cars.com/xml"))
sxml:ns-id->uri obj namespace-idprocedure

Returns the URI for the (first) namespace matching the given namespace-id, or #f if no namespace matches the given namespace-id.

(sxml:ns-id->uri
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")))) 'c)
 => "http://www.cars.com/xml"
sxml:ns-uri->nodes obj uriprocedure

Returns a list of namespace information lists that match the given uri in SXML element obj.

(sxml:ns-uri->nodes
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")
                                 (d "http://www.cars.com/xml"))))
  "http://www.cars.com/xml")
 => ((c "http://www.cars.com/xml") (d "http://www.cars.com/xml"))
sxml:ns-uri->id obj uriprocedure

Returns the namespace id for the (first) namespace matching the given uri, or #f if no namespace matches the given uri.

(sxml:ns-uri->id
  '(c:part (@) (@@ (*NAMESPACES* (c "http://www.cars.com/xml")
                                 (d "http://www.cars.com/xml"))))
  "http://www.cars.com/xml")
 => c
sxml:ns-id ns-listprocedure

Given a namespace information list ns-list, returns the namespace ID.

sxml:ns-uri ns-listprocedure

Given a namespace information list ns-list, returns the namespace URI.

sxml:ns-prefix ns-listprocedure

Given a namespace information list ns-list, returns the namespace prefix if it is present in the list. If it's not present, returns the namespace ID.

Data modification procedures

Constructors and mutators for normalized SXML data

Important: These functions are optimized for normalized SXML data. They are not applicable to arbitrary non-normalized SXML data.

Most of the functions are provided in two variants:

  1. Side-effect intended functions for linear update of given elements. Their names are ended with exclamation mark.
  2. Pure functions without side-effects which return modified elements.
sxml:change-content! obj new-contentprocedure
sxml:change-content obj new-contentprocedure

Change the content of given SXML element obj to new-content. If new-content is an empty list then the obj is transformed to an empty element. The resulting SXML element is normalized.

sxml:change-attrlist obj new-attrlistprocedure
sxml:change-attrlist! obj new-attrlistprocedure

Change the attribute list of the given SXML element obj to new-attrlist.

sxml:change-name obj new-nameprocedure
sxml:change-name! obj new-nameprocedure

Change the name of the given SXML element obj to new-name.

sxml:add-attr obj attrprocedure
sxml:add-attr! obj attrprocedure

Returns the given SXML element obj with the attribute attr added to the attribute list, or #f if the attribute already exists.

sxml:change-attr obj attrprocedure
sxml:change-attr! obj attrprocedure

Returns SXML element obj with changed value of attribute attr or #f if where is no attribute with given name.

attr is a list like it would occur as a member of an attribute list: (attr-name attr-value).

sxml:set-attr obj attrprocedure
sxml:set-attr! obj attrprocedure

Returns SXML element obj with changed value of attribute attr. If there is no such attribute the new one is added.

attr is a list like it would occur as a member of an attribute list: (attr-name attr-value).

sxml:add-aux obj aux-nodeprocedure
sxml:add-aux! obj aux-nodeprocedure

Returns SXML element obj with an auxiliary node aux-node added.

sxml:squeeze objprocedure
sxml:squeeze! objprocedure

Returns a minimized and normalized SXML element obj with empty lists of attributes and aux-lists eliminated, in obj and all its descendants.

sxml:clean objprocedure

Returns a minimized and normalized SXML element obj with empty lists of attributes and all aux-lists eliminated, in obj and all its descendants.

select-first-kid test-pred?procedure

Given a node, return the first child that satisfies the test-pred?. Given a nodeset, traverse the set until a node is found whose first child matches the predicate. Returns #f if there is no such a child to be found.

sxml:node-parent rootnodeprocedure

Returns a function of one argument - an SXML element - which returns its parent node using *PARENT* pointer in the aux-list. '*TOP-PTR* may be used as a pointer to root node. It returns an empty list when applied to the root node.

sxml:add-parents obj #!optional top-ptrprocedure

Returns the SXML element obj annotated with *PARENT* pointers for obj and all its descendants. If obj is not the root node (a node with a name of *TOP*), you must pass in the parent pointer for obj as top-ptr.

Warning: This procedure mutates its obj argument.

sxml:lookup id indexprocedure

Lookup an element using its ID. index should be an alist of (id . element).

Markup generation

XML
sxml:attr->xml attrprocedure

Returns a list containing tokens that when joined together form the attribute's XML output.

Warning: This procedure assumes that the attribute's values have already been escaped (ie, sxml:string->xml has been called on the strings inside it).

(sxml:attr->xml '(href "http://example.com"))
 => (" " "href" "='" "http://example.com" "'")
sxml:string->xml stringprocedure

Escape the string so it can be used anywhere in XML output. This converts the <, >, ', " and & characters to their respective entities.

sxml:sxml->xml treeprocedure

Convert the tree of SXML nodes to a nested list of XML fragments. These fragments can be output by flattening the list and concatenating the strings inside it.

HTML

sxml:attr->html attrprocedure

Returns a list containing tokens that when joined together form the attribute's HTML output. The difference with the XML variant is that this encodes empty attribute values to attributes with no value (think selected in option elements, or checked in checkboxes).

Warning: This procedure assumes that the attribute's values have already been escaped (ie, sxml:string->html has been called on the strings inside it).

sxml:string->html stringprocedure

Escape the string so it can be used anywhere in XML output. This converts the <, >, " and & characters to their respective entities.

sxml:non-terminated-html-tag? tagprocedure

Is the named tag one that is "self-closing" (ie, does not need to be terminated) in HTML 4.0?

sxml:sxml->html treeprocedure

Convert the tree of SXML nodes to a nested list of HTML fragments. These fragments can be output by flattening the list and concatenating the strings inside it.

Procedures from sxpathlib

Basic converters and applicators

A converter is a function

 type Converter = Node|Nodelist -> Nodelist

A converter can also play a role of a predicate: in that case, if a converter, applied to a node or a nodelist, yields a non-empty nodelist, the converter-predicate is deemed satisfied. Throughout this file a nil nodelist is equivalent to #f in denoting a failure.

nodeset? objprocedure

Returns #t if obj is a nodelist.

as-nodeset objprocedure

If obj is a nodelist - returns it as is, otherwise wrap it in a list.

Node test

The following functions implement 'Node test's as defined in Sec. 2.3 of the XPath document. A node test is one of the components of a location step. It is also a converter-predicate in SXPath.

sxml:element? objprocedure

Predicate which returns #t if obj is SXML element, otherwise #f.

ntype-names?? critprocedure

Takes a list of acceptable node names as a criterion and returns a function, which, when applied to a node, will return #t if the node name is present in criterion list and #f otherwise.

  ntype-names?? :: ListOfNames -> Node -> Boolean
ntype?? critprocedure

Takes a type criterion and returns a function, which, when applied to a node, will tell if the node satisfies the test.

 ntype?? :: Crit -> Node -> Boolean

The criterion crit is one of the following symbols:

@
tests if the Node is an attributes-list
*
tests if the Node is an Element
*text*
tests if the Node is a text node
*data*
tests if the Node is a data node (text, number, boolean, etc., but not pair)
*PI*
tests if the Node is a processing instructions node
*COMMENT*
tests if the Node is a comment node
*ENTITY*
tests if the Node is an entity node
*any*
#t for any type of Node
other symbol
tests if the Node has the right name given by the symbol
((ntype?? 'div) '(div (@ (class "greeting")) "hi"))
 => #t

((ntype?? 'div) '(span (@ (class "greeting")) "hi"))
 => #f

((ntype?? '*) '(span (@ (class "greeting")) "hi"))
 => #t
ntype-namespace-id?? ns-idprocedure

This function takes a namespace-id, and returns a predicate Node -> Boolean, which is #t for nodes with the given namespace id. ns-id is a string. (ntype-namespace-id?? #f) will be #t for nodes with non-qualified names.

sxml:complement predprocedure

This function takes a predicate and returns it complemented, that is if the given predicate yields #f or '() the complemented one yields the given node and vice versa.

node-eq? otherprocedure

Returns a predicate procedure that, given a node, returns #t if the node is the exact same as other.

node-equal? otherprocedure

Returns a predicate procedure that, given a node, returns #t if the node has the same contents as other.

node-pos nprocedure

Returns a procedure that, given a nodelist, returns a new nodelist containing only the nth element, counting from 1. If n is negative, it returns a nodelist with the nth element counting from the right. If no such node exists, returns the empty list. n may not equal zero.

Examples:

((node-pos 1) '((div "hi") (span "hello") (em "really, hi!")))
 => ((div "hi"))

((node-pos 6) '((div "hi") (span "hello") (em "really, hi!")))
 => ()

((node-pos -1) '((div "hi") (span "hello") (em "is this thing on?")))
 => ((em "is this thing on?"))
sxml:filter pred?procedure

Returns a procedure that accepts a nodelist or a node (which will be converted to a one-element nodelist) and returns only those nodes for which the predicate pred? does not return #f or '().

((sxml:filter (ntype?? 'div)) '((div "hi") (span "hello") (div "still here?")))
 => ((div "hi") (div "still here?"))
take-until pred?procedure
take-after pred?procedure

Returns a procedure that accepts a node or a nodelist.

The take-until variant returns everything before the first node for which the predicate pred? returns anything but #f or '(). In other words, it returns the longest prefix for which the predicate returns #f or '().

The take-after variant returns everything after the first node for which the predicate pred? returns anything besides #f or '().

((take-until (ntype?? 'span)) '((div "hi") (span "hello") (span "there") (div "still here?")))
 => ((div "hi"))

((take-after (ntype?? 'span)) '((div "hi") (span "hello") (span "there") (div "still here?")))
 => ((span "there") (div "still here?"))
map-union proc listprocedure

Apply proc to each element of the nodelist lst and return the list of results. If proc returns a nodelist, splice it into the result (essentially returning a flattened nodelist).

node-reverse node-or-nodelistprocedure

Accepts a nodelist and reverses the nodes inside. If a node is passed to this procedure, it returns a nodelist containing just that node. (it does not change the order of the children).

Converter combinators

Combinators are higher-order functions that transmogrify a converter or glue a sequence of converters into a single, non-trivial converter. The goal is to arrive at converters that correspond to XPath location paths.

From a different point of view, a combinator is a fixed, named pattern of applying converters. Given below is a complete set of such patterns that together implement XPath location path specification. As it turns out, all these combinators can be built from a small number of basic blocks; regular functional composition, map-union and filter applicators, and the nodelist union.

select-kids pred?procedure

Returns a procedure that accepts a node and returns a nodelist of the node's children that satisfy pred? (ie, pred? returns anything but #f or '()).

node-self pred?procedure

Similar to select-kids but applies to the node itself rather than to its children. The resulting Nodelist will contain either one component (the node), or will be empty (if the node failed the predicate).

node-join #!rest selectorsprocedure

Returns a procedure that accepts a nodelist or a node, and returns a nodelist with all the selectors applied to every node in sequence. The selectors must function as converter combinators, ie they must accept a node and output a nodelist.

((node-join
  (select-kids (ntype?? 'li))
  sxml:content)
 '((ul (@ (class "whiskies"))
       (li "Ardbeg")
       (li "Glenfarclas")
       (li "Springbank"))))
 => ("Ardbeg" "Glenfarclas" "Springbank")
node-reduce #!rest convertersprocedure

A regular functional composition of converters.

From a different point of view,

 ((apply node-reduce converters) nodelist)

is equivalent to

 (fold apply nodelist converters)

i.e., folding, or reducing, a list of converters with the nodelist as a seed.

node-or #!rest convertersprocedure

This combinator applies all converters to a given node and produces the union of their results. This combinator corresponds to a union, "|" operation for XPath location paths.

node-closure test-pred?procedure

Select all descendants of a node that satisfy a converter-predicate. This combinator is similar to select-kids but applies to grandchildren as well.

node-trace titleprocedure

Returns a procedure that accepts a node or a nodelist, which it pretty-prints to the current output port, preceded by title. It returns the node or the nodelist unchanged. This is a useful debugging aid, since it doesn't really do anything besides print its argument and pass it on.

sxml:node? objprocedure

Returns #t if the given obj is an SXML node, #f otherwise. A node is anything except an attribute list or an auxiliary list.

sxml:attr-list nodeprocedure

Returns the list of attributes for a given SXML node. The empty list is returned if the given node is not an element, or if it has no list of attributes.

This differs from sxml:attr-list-u in that this procedure accepts any SXML node while sxml:attr-list-u only accepts nodelists or elements. This means that sxml:attr-list-u will throw an error if you pass it a text node (a string), while sxml:attr-list will not.

sxml:attribute test-pred?procedure

Like sxml:filter, but considers the attributes instead of the nodes. Returns a nodelist of attribtes that match test-pred?.

((sxml:attribute (ntype?? 'id))
 '((div (@ (id "navigation")) "navigation here")
   (div (@ (class "pullquote")) "random stuff")
   (div (@ (id "main-content")) "lorem ipsum ...")))
 => ((id "navigation") (id "main-content"))
sxml:child test-pred?procedure

This procedure is similar to select-kids, but it returns an empty child-list for PI, Comment and Entity nodes.

sxml:parent test-pred?procedure

Returns a procedure that accepts a root-node, and returns another procedure. This second procedure accepts a nodeset (or a node) and returns the immediate parents of the nodes in the set, but only if for those parents that match the predicate.

The root-node does not have to be the root node of the whole SXML tree -- it may be a root node of a branch of interest.

This procedure can be used with any SXML node.

Useful shortcuts

node-parent nodeprocedure

(node-parent rootnode) yields a converter that returns a parent of a node it is applied to. If applied to a nodelist, it returns the list of parents of nodes in the nodelist.

This is equivalent to ((sxml:parent (ntype? '*any*)) node).

sxml:child-nodes nodeprocedure

Returns all the child nodes of the given node.

This is equivalent to ((sxml:child sxml:node?) node).

sxml:child-elements nodeprocedure

Returns all the child elements of the given node. (ie, excludes any textnodes).

This is equivalent to ((select-kids sxml:element?) node).

Procedures from sxpath-ext

SXML counterparts to W3C XPath Core Functions Library

sxml:string objectprocedure

The counterpart to XPath 'string' function (section 4.2 XPath 1.0 Rec.). Converts a given object to a string.

Notes:

  1. When converting a nodeset, document order is not preserved
  2. number->string returns the result in a form which is slightly different from XPath Rec. specification
sxml:boolean objectprocedure

The counterpart to XPath 'boolean' function (section 4.3 XPath Rec.). Converts its argument to a boolean.

sxml:number objectprocedure

The counterpart to XPath 'number' function (section 4.4 XPath Rec.). Converts its argument to a number.

Notes:

  1. The argument is not optional (yet?)
  2. string->number conversion is not IEEE 754 round-to-nearest
  3. NaN is represented as 0
sxml:string-value nodeprocedure

Returns a string value for a given node in accordance to XPath Rec. 5.1 - 5.7

sxml:id id-indexprocedure

Returns a procedure that accepts a nodeset and returns a nodeset containing the elements in the id-index that match the string-values of each entry of the nodeset. XPath Rec. 4.1

The id-index is an alist with unique IDs as key, and elements as values:

 id-index = ( (id-value . element) (id-value . element) ... )

Comparators for XPath objects

sxml:list-head list nprocedure

Returns the n first members of list. Mostly equivalent to SRFI-1's take procedure, except it returns the list if n is larger than the length of said list, instead of throwing an error.

sxml:merge-sort less-than? listprocedure

Returns the sorted list, the smallest member first.

 less-than? ::= (lambda (obj1 obj2) ...)

less-than? returns #t if obj1 < obj2 with respect to the given ordering.

sxml:equality-cmp bool=? number=? string=?procedure

A helper for XPath equality operations: = , !=. The bool=?, number=? and string=? arguments are comparison operations for booleans, numbers and strings respectively.

Returns a procedure that accepts two objects, looks at the first object's type and applies the correct comparison predicate to it. Type coercion takes place depending on the rules described in the XPath 1.0 spec, section 3.4 ("Booleans").

sxml:equal? obj1 obj2procedure
sxml:not-equal? obj1 obj2procedure

Equality procedures with the default comparison operators eq?, = and string=?, or their inverse, respectively.

sxml:relational-cmp opprocedure

A helper for XPath relational operations: <, >, <=, >= for two XPath objects. op is one of these operators.

Returns a procedure that accepts two objects and returns the value of the procedure applied to these objects, converted according to the coercion rules described in the XPath 1.0 spec, section 3.4 ("Booleans").

XPath axes

sxml:ancestor test-pred?procedure

Like sxml:parent, except it returns all the ancestors that match test-pred?, not just the immediate parent.

sxml:ancestor-or-self test-pred?procedure

Like sxml:ancestor, except also allows the node itself to match the predicate.

sxml:descendant test-pred?procedure

Like node-closure, except the resulting nodeset is in depth-first order instead of breadth-first.

sxml:descendant-or-self test-pred?procedure

Like sxml:descendant, except also allows the node itself to match the predicate.

sxml:following test-pred?procedure

Returns a procedure that accepts a root node and returns a new procedure that accepts a node and returns all nodes following this node in the document source matching the predicate.

sxml:following-sibling test-pred?procedure

Like sxml:following, except only siblings (nodes at the same level under the same parent) are returned.

sxml:preceding test-pred?procedure

Returns a procedure that accepts a root node and returns a new procedure that accepts a node and returns all nodes preceding this node in the document source matching the predicate.

sxml:preceding-sibling test-pred?procedure

Like sxml:preceding, except only siblings (nodes at the same level under the same parent) are returned.

sxml:namespace test-pred?procedure

Returns a procedure that accepts a nodeset and returns the namespace lists of the nodes matching test-pred?.

Examples

The SXML tutorial, though incomplete at the time of writing, contains a large section about sxpath and how to use it. This is your best bet for understanding it, aside from this eggdoc.

About this egg

Author

Oleg Kiselyov, Kirill Lisovsky, Dmitry Lizorkin.

Repository

This egg is hosted on the CHICKEN Subversion repository:

https://anonymous@code.call-cc.org/svn/chicken-eggs/release/5/sxpath

If you want to check out the source code repository of this egg and you are not familiar with Subversion, see this page.

Version history

1.0
Port to CHICKEN 5.
0.2.1
Remove warning about omitting a nonexisting identifier (define-macro) from the chicken module. Fixes #1316, thanks to Vasilij Schneidermann.
0.2
Add modules for DDO and contextual versions of sxpath. Split up xpath-parser low-level stuff into its own module.
0.1.3
Fix bug in normalize-space() and possibly other xpath primitives reported by Felix.
0.1.2
Fix problem with attribute selectors reported by Daishi Kato.
0.1.1
Use string-concatenate instead of (apply string-append ...)
0.1
Split up the old sxml-tools egg into sxpath

License

The sxml-tools are in the public domain.

Contents »