Using XPath Expressions
The NodeInfo Object Building the Source Document
Event-based Processing: the Controller
Writing and Registering a Node Handler
This document describes how to use Saxon as a Java class library, without making any use of XSLT stylesheets. If you want to know how to control stylesheet processing from a Java application, see using-xsl.html.
The first section below describes Saxon's XPath API. This allows a Java application to execute XPath expressions against a source document, and manipulate the results. No stylesheet is involved.
Subsequent sections describe more intimate APIs that are available in Saxon. Some of these are historic, and are not necessarily stable. Saxon offers an event-based API that allows a source document to be processed using the same processing model as XSLT (that is, particular node handlers are fired to process individual nodes, selected according to the patterns that match each node).
In addition to the APIs described here, Saxon's JavaDoc documentation describes all the public classes and methods that are available for advanced users. These are all subject to change from one release to the next.
A new API has been introduced for executing XPath expressions. The API is loosely modelled
on the proposed DOM Level 3 API for XPath. For full documentation, see the Javadoc description of package
net.sf.saxon.xpath. A sample application using this API is available: it is called
and can be found in the
samples/java directory. To run this application, see the instructions
This API is based on the class
net.sf.saxon.xpath.XPathEvaluator. This class provides a few
simple configuration interfaces to set the source document, the static context, and the context node,
plus a number of methods for evaluating XPath expressions.
XPathEvaluator must be initialized with a source document, which can be supplied
as a JAXP
Source object. Any kind of
Source object recognized by Saxon is
allowed (including, for example, a JDOM source). This can be supplied either in the constructor for the
XPathEvaluator, or through the
setSource method. The
method returns a
net.sf.saxon.om.DocumentInfo object representing the root of the
tree for the document: this is useful if you want to use some of the more advanced features of the
Saxon API, but you can ignore it if you don't need it.
There are two methods for direct evaluation of XPath expressions,
evaluate() which returns a List containing the result of the expression (which in general is a sequence),
evaluateSingle() which returns the first item in the result (this is appropriate where it is known
that the result will be single-valued). The results are returned as
NodeInfo objects in the case of nodes,
or as objects of the most appropriate Java class in the case of atomic values: for example, Boolean, Double,
or String in the case of the traditional XPath 1.0 data types.
XPath itself provides no sorting capability. You can therefore specify a sort order in which you want
the results of an expression returned. This is done by nominating another expression, via the
method: this second expression is applied to each item in the result sequence, and its value determines
the position of that item in the sorted result order.
You can call methods directly on the
NodeInfo object to get information about a node: for
getDisplayName() gets the name of the node in a form suitable for display, and
getStringValue() gets the string value of the node, as defined in the XPath data model. You
can also use the node as the context node for evaluation of subsequent expressions, by calling the
setContextNode on the
It is also possible to prepare an XPath expression for subsequent execution, using the
XPathEvaluator class. This is worthwhile where the same expression is to be executed repeatedly.
The compiled expression is represented by an instance of the class
and it can be executed repeatedly, with different context nodes. However, the compiled expression is bound
to one particular source document (this is to ensure that the same NamePool is used).
A compiled expression can reference XPath variables; the values of these variables must be supplied
before the expression is evaluated, and can be different each time it is evaluated. To do this you will
need access to the
StandaloneContext object used by the
can get this by calling
getStaticContext and casting the result to a
an expression that uses variables, the variables it uses must be declared using the
StandaloneContext class. This method returns a
Variable object, whose
setValue() method can be used to set a value for the variable before the expression is
StandaloneContext object is also needed if the XPath expression uses namespaces (which
it will need to, if the source document itself uses namespaces). Before compiling or evaluating an
XPath expression that uses namespace prefixes, the namespace must be declared. You can do this explicitly
declareNamespace() method on the
Alternatively, you can use the
setNamespaces() method, which declares all the namespaces
that are in-scope for a given node in the source document.
Certain namespaces are predeclared with their conventional prefixes: the XSLT namespace (xsl),
the XML namespace (xml), the XML Schema namespace (xs), and the Saxon namespace (saxon).
All the core XPath functions are available, with the exception of the
The XSLT-specific functions, such as
generate-id, are not available.
You can call Java extension functions by binding a namespace to the Java class (for example,
java:java.lang.Double). You can also call Saxon and EXSLT extension functions using their
normal namespace - with the exception of a small number of Saxon extension functions, such as
saxon:serialize, which work only in an XSLT context.
The design principle of this API is to minimize the number of Saxon classes that need to be used.
Apart from the
NodeInfo interface, which is needed when manipulating Saxon trees, only the four classes
XPathProcessor, XPathExpression, StandaloneContext, and XPathException are needed.
For convenience, these classes are all in the
The NodeInfo object represents a node of an XML document. It has a subclass
represent the root node, but all other nodes are represented by
NodeInfo itself. These follow the
XPath data model closely.
In earlier releases, NodeInfo extended the DOM interface
Node. This is no longer the case;
it was changed to make it easier to integrate Saxon with other XML tree representations such as JDOM.
However, the main Saxon implementations of the
NodeInfo interface continue to also implement the DOM
Node interface, so you can still use DOM methods by casting the concrete node object to a DOM class.
The NodeInfo object provides the application with information about the node. The most commonly-used methods include:
|getItemType()||gets a short identifying the node type (for example, element or attribute). The values are
consistent with those used in the DOM, and are referenced by constants in the class
|getDisplayName(), getLocalName(), getPrefix(), getURI()||These methods get the name of the element, or its various parts. The getDisplayName() method returns the QName as used in the original source XML.|
|getAttributeValue()||get the value of a specified attribute, as a String.|
|getStringValue()||get the string value of a node, as defined in the XPath data model|
|getParent()||get the NodeInfo representing the parent element, (which will be a
|getEnumeration()||returns an SequenceIterator object that can be used to iterate over the nodes on any of the
XPath axes. The first argument is an integer identifying the axis; the second is a
The first thing the application must do is to build the source document, in the form of a tree. The simplest approach is to use the sequence:
Alternatively, you can use the JAXP 1.1 interface. For example:
You can define the parser to be used by supplying a parser within the
supplied to the
Builder.build() method. If you don't supply a parser, Saxon will select one
using the JAXP mechanisms, specifically, the system property
If you want to use different parsers depending on the URI of the document being read,
you can achieve this by writing a
URIResolver that nominates the parser to be used for each
If you want to use Saxon's event-based processing from a Java application, or other XSLT-like features
such as keys, or if you want to produce a result tree and perhaps serialize it, then you will need
to create a
is Saxon's implementation of the JAXP
Some of the functions of this class are
relevant only to XSLT transformation, but most can also be used when Saxon is used purely from Java.
Each application run must instantiate a new
There are several classes used to define the kind of processing you want to perform. These are the RuleManager for registering template rules, the KeyManager for registering key definitions, the PreviewManager for registering preview elements, the Stripper for registering which elements are to have whitespace nodes stripped, and the DecimalFormatManager for registering named decimal formats. These classes can all be reused freely, and they are thread safe once the definitions have been set up. All of these objects are registered with the Controller using methods such as setRuleManager() and setKeyManager().
The Controller class is used to process a document tree by applying registered node handlers. Its main method is run(). The controller is responsible for navigating through the document and calling user-defined handlers which you associate with each element or other node type to define how it is to be processed. The controller can also be serially reused, but should not be used to process more than one document at a time. The Controller needs to know about the RuleManager to find the relevant node handlers to invoke. If keys are used it will need to know about the KeyManager, and if decimal formats are used it will need to know about the DecimalFormatManager. These classes can be registered with the Controller using setRuleManager(), setKeyManager(), and setDecimalFormatManager() respectively. If preview mode is used, the PreviewManager will need to know about the Controller, so it has a setController() method for this purpose.
You can register a node handlers that will be called to process each node, in the same way as template rules are used in XSLT. They node handler can choose whether or not subsidiary elements should be processed (by calling applyTemplates()), and can dive off into a completely different part of the document tree before resuming. A user-written node handler must implement the NodeHandler interface.
To register a node handler, create a RuleManager, register the node handler with it using its setHandler() method, and regsiter the RuleManager with the Controller by calling the Controller's setRuleManager() method.
Always remember that if you want child elements to be processed recursively, your node handler must call the applyTemplates() method.
A node handler can write to the current output destination. The controller maintains a current outputter. Your node handler can switch output to a new destination by calling changeOutputDestination(), and can revert to the previous destination by calling resetOutputDestination(). This is useful both for splitting an input XML document into multiple XML documents, and for creating output fragments that can be reassembled in a different order for display. Details of the output format required must be set up in a Properties object, which is supplied as a parameter to changeOutputDestination().
The node handler is supplied with an NodeInfo object which provides information about the current node, and with a Context object that gives access to a range of standard services such an Outputter object which includes a write() method to produce output.
Normally you will write one node handler for each type of element, but it is quite possible to use the same handler for several different elements. You can also write completely general-purpose handlers. You define which elements will be handled by each element handler using a pattern, exactly as in XSLT.
You only need to provide one method for the selected node type. This is:
|start()||This is called when the node is encountered in the tree. The NodeInfo object
passed gives you information about the relevant node. You can save information
for later use if required, using one of several techniques:
Patterns are used in the setHandler() interface to define which nodes a particular handler applies to. Patterns used in the Saxon Java API have exactly the same form as in XSLT.
The detailed rules for patterns can be found in patterns.html.
Patterns are represented in the API by the class net.sf.saxon.pattern.Pattern respectively. It operates in much the same way as the Expression class introduced earlier. There is a static method to create a Pattern from a String, and a method matches() that tests whether a particular node matches a pattern.
When you create a Pattern using the method Pattern.make() you must supply a StaticContext object. This object provides the information needed to interpret certain patterns: for example, it provides the ability to convert a namespace prefix within the expressions into a URI. In an XSLT stylesheet, the StaticContext provides information the expression can get from the rest of the stylesheet; in a Java application, this is not available, so you must provide the context yourself. If you don't supply a StaticContext object, a default context is used: this will prevent you using context-dependent constructs such as variables and namespace prefixes.
Michael H. Kay
10 February 2003