This page describes how to use Saxon as an XQuery processor, either from the command line, or from the Java API.
Contents | |
Using XQuery | Reference |
Introduction Running a Query from the Command Line Embedding a Query in an Application Result Format Use Cases |
See also: XPath Expressions Function Library Saxon Extensibility XQuery Conformance |
For details of how to set up collation URIs for use in XQuery, see Collation URIs
Saxon, from release 7.6 onwards, supports XQuery as well as XSLT.
The run-time code for both languages is identical, reflecting the fact that they have very similar semantics. The XQuery support in Saxon consists essentially of an XQuery parser (which is itself an extension of the XPath parser); the parser generates the same internal intepretable code as the XSLT processor.
The XQuery processor may be invoked either from the operating system command line, or via an API from a Java application. There is no graphical user interface provided.
Saxon is an in-memory processor. Whether you use XSLT or XQuery, Saxon is designed to process source documents that fit in memory. If you want to handle a large database, you need an XML database product such as Software AG's Tamino.
This release (7.8) introduces support for XQuery library modules, including the ability for an XSLT stylesheet to import the function defined in an XQuery library module. At present the reverse capability, to call XSLT functions from XQuery, is not available.
The Java class net.sf.saxon.Query
has a main program that
may be used to run a query contained in a file. The form of
command is:
java net.sf.saxon.Query [options] query [ params...]
The options must come first, then the file name containing the query, then the params.
The options are as follows (in any order):
-ds | Use the classic tree model for source documents. See Choosing a Tree Model. |
-dt | Use the "tinytree" tree model for source documents. This is the default tree model. See Choosing a Tree Model. |
-noext | Prevents the query calling external Java functions. This is useful for safety if the query is untrusted. |
-o filename | Send output to named file. In the absence of this option, the results go to standard output. The output format depends on whether the -wrap option is present. |
-r classname | Use the specified URIResolver to process all URIs. The URIResolver is a user-defined class, that implements the URIResolver interface defined in JAXP, whose function is to take a URI supplied as a string, and return a SAX InputSource. It is invoked to process URIs used in the doc() function, and (if -u is also specified) to process the URI of the source file provided on the command line. |
-s filename-or-URI | Take input from the specified file. If the -u option is specified, or if the name begins with "file://" or "http://", then the name is assumed to be a URI rather than a filename. This file must contain an XML document. The document node of the document is made available to the query as the context item. The source document can be specified as "-" to take the source from standard input. |
-strip | Strip all whitespace-only text nodes from source documents. |
-t | Display version and timing information to the standard error output. The output also traces the files that are read and written, and extension modules that are loaded. |
-TJ | Switches on tracing of the binding of calls to external Java methods. This is useful when analyzing why Saxon fails to find a Java method to match an extension function call in the stylesheet, or why it chooses one method over another when several are available. |
-u | Indicates that the name of the source document is a URI; otherwise it is taken as a filename, unless it starts with "http:" or "file:", in which case they it is taken as a URL. |
-wrap | Wraps the result sequence in an element structure that indicates the type of each node or atomic value in the query result. |
-? | Display command syntax |
query | Identifies the file containing the query. Mandatory. The argument can be
specified as "-" to read the query from standard input. The query can also be specified
inline by enclosing it in curly braces (if it contains spaces, you will also need quotes
outside the curly braces to keep the command line processor happy).
For example java net.sf.saxon.Query {doc('a.xml')//p[1]} selects elements
within the file a.xml in the current directory. |
A param takes the form name=value
, name being the
name of the parameter, and value the value of the parameter. These parameters are
accessible within the query as external variables, using the $name
syntax, provided
they are declared in the query prolog. If there is no such declaration, the supplied
parameter value is silently ignored. Not yet tested.
A param preceded by a leading plus sign (+) is interpreted as a filename or directory.
The content of the file is parsed as XML, and the resulting document node is passed to the stylesheet
as the value of the parameter. If the parameter value is a directory, then all the immediately contained
files are parsed as XML, and the resulting sequence of document nodes is passed as the value of the
parameter. For example, +lookup=lookup.xml
sets the value of the external variable
lookup
to the document node at the root of the tree representing the parsed contents
of the file lookup.xml
.
A param preceded by a leading exclamation mark is interpreted as a serialization parameter.
For example, !indent=yes
requests indented output, and !encoding=iso-8859-1
requests that the serialized output be in ISO 8859/1 encoding. This is equivalent to specifying
the attribute indent="yes"
or encoding="iso-8859-1"
on an xsl:output
declaration in an XSLT stylesheet.
Under Windows, and some other operating systems, it is possible to supply a value containing
spaces by enclosing it in double quotes, for example name="John Smith"
. This is a feature
of the operating system shell, not something Saxon does, so it may not work the
same way under every operating system.
If the parameter name is in a non-null namespace, the parameter can be given a value using
the syntax {uri}localname=value
. Here uri
is the namespace URI of the
parameter's name, and localname
is the local part of the name.
This applies also to output parameters. For example, you can set the indentation level to 4 by
using the parameter !{http://saxon.sf.net/}indent-spaces=4
. For the extended set of
output parameters supported by Saxon, see extensions.html.
Rather than using the query processor from the command line, you may want to include it in your own application, perhaps one that enables it to be used within an applet or servlet. If you run the processor repeatedly, this will always be much faster than running it each time from a command line, even if it handles a different query each time.
There is currently no standard API for XQuery, so I have invented one. It is fully described
in the JavaDoc included in the download: look for the package net.sf.saxon.query
.
The starting point is the class QueryProcessor
. What follows here is an
overview. For an example of how the API can be used, take a look at the source code for the class
net.sf.saxon.Query
, which implements the command line interface.
The first thing you need to do is to create a net.sf.saxon.Configuration
object.
This holds values of all the system settings, corresponding to flags available on the command line.
You don't need to set any properties in the Configuration
object if you are happy
with the default settings.
Then you need to create a net.sf.saxon.StaticQueryContext
object. As the name
implies, this holds information about the static (compile-time) context for a query. Most aspects
of the static context can be defined in the Query Prolog, but this object allows you to initialize
the static context from the application instead if you need to. Some of the facilities provided are
very much for advanced users only, for example the ability to declare variables and functions, and
the ability to specify a NamePool to be used. One aspect of the static context that you may need
to use is the ability to declare collations. Using the method declareCollation
you can
create a mapping between a collation URI (which can then be used anywhere in the Query) and a Java
Comparator
object used to implement that collation.
Having created, and possibly configured, the Configuration
and
StaticQueryContext
objects, you can now create a QueryProcessor
object
with a call such as:
QueryProcessor qp = new QueryProcessor(config, staticContext);
The QueryProcessor
object can now be used to compile a Query. The text of the
Query can be supplied either as a String
or as a Java Reader
. There
are thus two different compileQuery
methods. Each of them returns the compiled
query in the form of a QueryExpression
. The QueryExpression, as you would expect,
can be executed repeatedly, as often as you want, in the same or in different threads.
Before you run your query, you may want to build one or more trees representing
XML documents that can be used as input to your query. You don't need to do this: if the query
loads its source documents using the doc()
function then this will be done
automatically, but doing it yourself gives you more control. A document node at the root of
a tree is represented in Saxon by the net.sf.saxon.DocumentInfo
interface.
The QueryProcessor
provides a convenience method, buildDocument()
,
that allows an instance of DocumentInfo
to be constructed. The input parameter to
this is defined by the class javax.xml.transform.Source
, which is part of the
standard Java JAXP API: the Source
interface is an umbrella for different kinds of
XML document source, including a StreamSource
which parses raw XML from a byte
or character stream, SAXSource
which takes the input from a SAX parser (or an
object that is simulating a SAX parser), and DOMSource
which provides the input
from a DOM. Saxon also provides a net.sf.saxon.jdom.DocumentWrapper
which allows
the input to be taken from a JDOM document.
To execute your compiled query, you need to create a DynamicQueryContext
object
that holds the run-time context information. The main things you can set in the run-time context are:
setParameter()
method. The mappings from Java classes to XQuery/XPath data types is the same as the mapping used for the
returned values from an external Java method call, and is described under
Result of an Extension Function.setContextNode()
. For some reason
it isn't possible to set a context item other than a node.You are now ready to evaluate the query. There are several methods on the QueryExpression
object that you can use to achieve this. The evaluate()
method returns the result sequence
as a Java java.util.List
. The evaluateSingle()
method is suitable when you know
that the result sequence will contain a single item: this returns this item as an Object, or returns null
if the result is an empty sequence. There is also an iterator
method that returns an iterator
over the results. This is a Saxon object of class net.sf.saxon.om.SequenceIterator
: it is similar
to the standard Java iterator, but not quite identical; for example, it can throw exceptions.
The evaluate()
and evaluateSingle()
methods return the result as a Java object
of the most appropriate type: for example a String is returned as a java.lang.String
, a
boolean as a java.lang.Boolean
. A node is returned using the Saxon representation of a node,
net.sf.saxon.om.NodeInfo
. With the standard and tinytree models, this object also implements
the DOM Node
interface (but any attempt to update the node throws an error).
The iterator()
method, by contrast, does not do any conversion of the result. It is returned
using its native Saxon representation, for example a String is returned as an instance of
sf.net.saxon.value.StringValue
. You can then use all the methods available on this class
to process the returned value.
If you want to process the results of the query in your application, that's all there is to it. But you may want to output the results as serialized XML. Saxon provides two ways of doing this: you can produce wrapped output, or raw output. Raw output works only if the result consists of a single document or element node, and it outputs the subtree rooted at that element node in the form of a serialized XML document. Wrapped output works for any result sequence, for example a sequence of integers or a sequence of attribute and comment nodes; this works by wrapping each item in the result sequence as an XML element, with details of its type and value.
To produce wrapped output, you first wrap the result sequence as an XML tree, and then serialize the tree. To produce unwrapped output, you skip the wrapping stage and just call the serializer directly.
Both steps can be done using the QueryResult
class. This class doesn't need to be
instantiated, its methods are static. The method QueryResult.wrap
takes as input the iterator
produced by evaluating the query using the iterator()
method, and produces as output
a DocumentInfo
object representing the results wrapped as an XML tree. The method
QueryResult.serialize
takes any document or element node as input, and writes it to
a specified destination, using specified output properties. The destination is supplied as an object
of class javax.xml.transform.Result
. Like the Source
, this is part of the
JAXP API, and allows the destination to be specified as a StreamResult (representing a byte stream or
character stream), a SAXResult (which wraps a SAX ContentHandler), or a DOMResult
(which delivers the result as a DOM). The output properties are used only when writing to
a StreamResult: they correspond to the properties available in the xsl:output
element
for XSLT. The property names are defined by constants in the JAXP javax.xml.transform.OutputKeys
class (or net.sf.saxon.event.SaxonOutputKeys
for Saxon extensions): for details of the
values that are accepted, see the JavaDoc documentation or the JAXP specification.
The result of a query is a sequence of nodes and atomic values - which means it is not, in general, an XML document. This raises the question as to how the results should be output.
The Saxon command line processor for XQuery by default produces the output in unwrapped format. This outputs each item in the result sequence starting on a new line. If the item is a document node or an element node, then it is serialized as an XML document. If it is any other value, it is converted to a string and its string value is displayed.
The alternative is wrapped format, requested using the -wrap
argument.
This wraps the result sequence as an XML document,
and then serializes the resulting document. Each item in the result sequence is wrapped in an element
(such as result:element
or result:atomic-value
) according to its type. The sequence
as a whole is wrapped in a result:sequence
element.
The Saxon XQuery implementation allows you to call Java methods as external functions.
The function does not need to be declared. Use a namespace declaration such as
declare namespace math=java:java.lang.Math
,
and invoke the method as math:sqrt(2)
.
More details of this mechanism are found in
Writing Extension Functions; note however
that for XQuery the only form of namespace URI accepted is java:full.class.Name
.
The full library of Saxon and EXSLT functions described in extensions.html is also available, except for those (such as saxon:serialize) that have an intrinsic dependency on an XSLT stylesheet.
Saxon 7.7 runs all the XQuery Use Cases with the exception of the STRONG use case, which is designed to exercise schema-aware query processors.
The relevant queries (some of which have been corrected from those published by W3C) are included
in the Saxon distribution (folder use-cases
) together with a batch script for running them.
A few additional use cases have been added to show features that would otherwise not be exercised.
Also included in the distribution is a query samples\query\tour.xq
. This is a query that
generates a knight's tour of the chessboard. It is written as a demonstration of recursive functional
programming in XQuery. It requires no input document. You need to supply a parameter on the command
line indicating the square where the knight should start, for example start=h8
. The output
is an HTML document.
Michael H. Kay
12 November 2003