This document is an informal guide to the syntax of XPath 2.0 expressions, which are used in Saxon both within XSLT stylesheets, and in the Java API. For formal specifications, see the XPath 2.0 specification, except where differences are noted here.
XPath expressions may be used either in an XSL stylesheet, or as a parameter to various Java
interfaces. The syntax is the same in both cases. In the Java interface, expressions are handled
using the net.sf.saxon.xpath.XPathEvaluator
class, and are parsed using a call such as
XPathEvaluator.createExpression("$a + $b")
. This returns an object of class net.sf.saxon.xpath.XPathExpression
,
which provides two methods for evaluating the expression: evaluate()
, which returns the value
of the expression, and iterator()
, which allows iteration over the items in the sequence
returned by the expression. For further details of these methods, see the API documentation.
An important change in XPath 2.0 is that all values are now considered as sequences. A sequence
consists of zero or more items; an item may be a node or a simple-value. Examples of simple-values
are integers, strings, booleans, and dates. A single value such as
a number is considered as a sequence of length 1. The empty sequence is written as ()
;
a singleton sequence may be written as "a"
or ("a")
, and a general
sequence is written as ("a", "b", "c")
.
The node-sets of XPath 1.0 are replaced in XPath 2.0 by sequences of nodes. Path expressions will return node sequences whose nodes are in document order with no duplicates, but other kinds of expression may return sequences of nodes in any order, with duplicates permitted.
This page summarizes the syntactic constructs and operators provided in XPath 2.0. The functions provided in the function library are listed separately: see functions.html.
String literals are written as "London" or 'Paris'. In each case you can use the opposite
kind of quotation mark within the string: 'He said "Boo"', or "That's rubbish". In a stylesheet
XSL expressions always appear within XML attributes, so it is usual to use one kind of delimiter for
the attribute and the other kind for the literal. Anything else can be written using XML character
entities. In XPath 2.0, string delimiters can be doubled within the string to represent the
delimiter itself: for example <xsl:value-of select='"He said, ""Go!"""'/>
Numeric constants follow the Java rules for decimal literals: for example, 12
or 3.05
; a
negative number can be written as (say) -93.7
, though technically the minus sign is not part of the
literal. (Also, note that you may need a space before the minus sign to avoid it being treated as
a hyphen within a preceding name). The numeric literal is taken as a double precision floating
point number if it uses scientific notation (e.g. 1.0e7
), as fixed point decimal
if it includes a full stop, or as a integer otherwise. Decimal values in Saxon have unlimited
precision, integers are limited to 64 bits. Note that a value such as 3.5
was
handled as a double-precision floating point number in XPath 1.0, but as a decimal number in
XPath 2.0: this may affect the precision of arithmetic results. Saxon implements decimal arithmetic
using the Java class java.math.BigDecimal
There are no boolean constants as such: instead use the function calls true()
and
false()
.
Constants of other data types can be written using constructors, which look like function calls
but require a string literal as their argument. For example, xs:float("10.7")
produces
a single-precision floating point number. Saxon implements constructors for many of the built-in data types
defined in XML Schema Part 2: for a full list see conformance.html.
An example for date and dateTime values:
you can write constants for these data types as xs:date("2002-04-30")
or xs:dateTime("1966-07-31T15:00:00Z")
.
The latest (November 2002) draft of XPath 2.0 allows the argument to a constructor to contain whitespace, as determined by the whitespace facet for the target data type. This feature is implemented in Saxon 7.4
The value of a variable (local or global variable, local or global parameter) may be referred to
using the construct $name
, where name is the variable name.
The variable is always evaluated at the textual place where the expression containing it appears;
for example a variable used within an xsl:attribute-set
must be in scope at the point where the
attribute-set is defined, not the point where it is used.
A variable may take a value of any data type, and in general it is not possible to determine its data type statically.
It is an error to refer to a variable that has not been declared.
Starting with XPath 2.0, variables (known as range variables) may be declared within
an XPath expression, not only using xsl:variable
elements in the stylesheet. The
expressions that declare variables are the for
, some
, and every
expressions.
Saxon 7.4 does not allow two range variables within an expression to have the same name.
A function call in XPath 2.0 takes the form F ( arg1, arg2, ...)
. In general, the
function name is a QName. A library of core functions is defined in the XPath 2.0 and XSLT 2.0
specifications. For details of these functions, including notes on their implementation
in this Saxon release, see functions.html.
Additional functions are available (in a special namespace) as Saxon extensions:
these are listed in extensions.html. Further functions may be
implemented by the user, either as XSLT stylesheet functions (see xsl:function),
or as Java extension functions (see extensibility.html).
In Saxon 7.4, the core function library is in no namespace; the functions are referenced without using a namespace prefix.
Saxon 7.4 implements function calls using the XPath 2.0 function call rules.
Essentially, this means that the supplied value is not implicitly
cast to the required type unless (a) the supplied value is an untyped element or attribute node,
or (b) backwards compatibility mode is set (by setting version="1.0"
and the required
type is string or number. In all other cases, casting must be done explicitly if required.
The basic primitive for accessing a source document is the axis step. Axis steps
may be combined into path expressions using the path operators /
and //
,
and they may be filtered using filter expressions in the same way as the result of any other
expression.
An axis step has the basic form axis :: node-test
, and selects nodes on a given axis
that satisfy the node-test. The axes available are:
ancestor | Selects ancestor nodes starting with the current node and ending with the document node |
ancestor-or-self | Selects the current node plus all ancestor nodes |
attribute | Selects all attributes of the current node (if it is an element) |
child | Selects the children of the current node, in documetn order |
descendant | Selects the children of the current node and their children, recursively (in document order) |
descendant-or-self | Selects the current node plus all descendant nodes |
following | Selects the nodes that follow the current node in document order, other than its descendants |
following-sibling | Selects all subsequent child nodes of the same parent node |
parent | Selects the parent of the current node |
preceding | Selects the nodes that precede the current node in document order, other than its ancestors |
preceding-sibling | Selects all preceding child nodes of the same parent node |
self | Selects the current node |
When the child axis is used, child::
may be omitted, and when the attribute
axis is used, attribute::
may be abbviated to @
. The expression
parent::node()
may be shortened to ..
The expression .
is no longer synonymous with self::node()
,
since it may now select items that are not nodes. If the context item is not a node, any use of a
path expression will raise an error.
The node-test may be:
prefix:*
to select nodes in a given namespace*:localname
to select nodes with a given local name, regardless of namespacetext()
(to select text nodes)node()
(to select any node)processing-instruction()
(to select any processing instruction)processing-instruction('literal')
to select processing instructions with the given name
(target)comment()
to select comment nodesSaxon 7.4 allows the constructs @node()
, @text()
, etc, which were
allowed in XPath 1.0 but are not allowed in the current XPath 2.0 draft.
In general an expression may be enclosed in parentheses without changing its meaning.
If parentheses are not used, operator precedence follows the sequence below, starting with the operators that bind most tightly. Within each group the operators are evaluated left-to-right
Operator | Meaning |
[] | predicate |
/, // | path operator |
cast as, treat as | type conversion |
except, intersect | set difference and intersection |
|, union | union operation on sets |
unary - | unary minus |
*, div, idiv, mod | multiply, divide, integer divide, modulo |
+, - | plus, minus |
to | range expression |
=, !=, is, isnot, <, <=;, >, >=;, eq, ne, lt, le, gt, ge | comparisons |
instance of, castable as | type tests |
if | conditional expressions |
some, every | quantified expressions |
for | iteration (mapping) over a sequence |
and | Boolean and |
or | Boolean or |
, (comma) | Sequence concatenation |
The latest (November 2002) drafts of XPath 2.0 and XSLT 2.0 allow a, b, c
as a top-level
expression. This is not yet implemented in Saxon 7.4. Saxon allows the comma operator only within
parentheses.
The various operators are described, in this order, in the sections that follow.
The notation E[P]
is used to select items from the sequence obtained by evaluating
E
. If the predicate P
is numeric, the predicate selects an item if its
position (counting from 1) is equal to P
; otherwise, the effective boolean value
of P
determines whether an item is selected or not. The effective boolean value of a sequence
is false if the sequence is empty, or if it contains a single item that is one of: the boolean value
false, the zero-length string, or a numeric zero or NaN value. Otherwise, the effective boolean
value is true.
In XPath 2.0, E
may be any sequence, it is not restricted to a node sequence. Within
the predicate, the expression .
(dot) refers to the context item, that is, the item
currently being tested. The XPath 1.0 concept of context node has thus been generalized, for example
.
can refer to a string or a number.
Generally the order of items in the result preserves the order of items in E
. As a
special case, however, if E
is a step using a reverse axis (e.g. preceding-sibling), the
position of nodes for the purpose of evaluating the predicate is in reverse document order, but the
result of the filter expression is in forwards document order.
A path expression is a sequence of steps separated by the /
or //
operator.
For example, ../@desc
selects the desc
attribute of the parent of the context
node.
In XPath 2.0, path expressions have been generalized so that any expression can be used as an operand
of /
, (both on the left and the right), so long as its value is a sequence of nodes. For
example, it is possible to use a union expression (in parentheses) or a call to the id()
or key()
functions. The right-hand operand is evaluated once for each node in the sequence
that results from evaluating the left-hand operand, with that node as the context item. In the result
of the path expression, nodes are sorted in document order, and duplicates are eliminated.
In practice, it only makes sense to use expressions on the right of /
if they depend
on the context item. It is legal to write $x/$y
provided both $x
and
$y
are sequences of nodes, but the result is exactly the same as writing ./$y
.
Note that the expressions ./$X
or $X/.
can be used to remove duplicates
from $X
and sort the results into document order. The same effect can be achieved by writing
$X|()
The operator //
is an abbreviation for /descendant-or-self::node()/
.
An expression of the form /E
is shorthand for root(.)/E
, and the expression
/
on its own is shorthand for root(.)
.
The expression cast as T (E)
converts the value of expression E
to type
T
. Since T
must currently be a built-in schema-defined simple type, the
effect is exactly the same as using the constructor function T (E)
.
Saxon implements most of the conversions defined in the XPath 2.0 specifications, for the data types that it supports, but the details of how the conversions are performed may vary in detail. The specification is still evolving in this area.
The expression treat as T (E)
is designed for environments that perform static type
checking. Saxon doesn't do static type checking, so this expression has very little use, except to
document an assertion that the expression E
is of a particular type. A run-time failure
will be reported if the value of E
is not of type T
; no attempt is made
to convert the value to this type.
These operators are new in XPath 2.0.
The expression E1 except E2
selects all nodes that are in E1
unless
they are also in E2
. Both expressions must return sequences of nodes. The results
are returned in document order. For example, @* except @note
returns all attributes
except the note
attribute.
The expression E1 intersect E2
selects all nodes that are in both E1
and
E2
. Both expressions must return sequences of nodes. The results
are returned in document order. For example, preceding::fig intersect ancestor::chapter//fig
returns all preceding fig
elements within the current chapter.
The |
operator was available in XPath 1.0; the keyword union
has been
added in XPath 2.0 as a synonym, because it is familiar to SQL users.
The expression E1 union E2
selects all nodes that are in either E1
or
E2
or both. Both expressions must return sequences of nodes. The results
are returned in document order. For example, /book/(chapter | appendix)/sections
returns
all section
elements within a chapter
or appendix
of the
selected book
element.
The unary minus operator changes the sign of a number. For example -1
is minus one, and
-0e0
is the double value negative zero.
The operator *
multiplies two numbers. If the operands are of different types, one
of them is promoted to the type of the other (for example, an integer is promoted to a decimal, a
decimal to a double). The result is the same type as the operands after promotion.
The operator div
divides two numbers. Dividing two integers produces a double; in other
cases the result is the same type as the operands, after promotion. In the case of decimal division,
the precision is the sum of the precisions of the two operands, plus six.
The operator idiv
performs integer division. For example, the result of
10 idiv 3
is 3
.
The mod
operator returns the modulus (or remainder) after division. See the XPath 2.0
specification for details of the way that negative numbers are handled.
The operators *
and div
may also be used to multiply or divide
a duration by a number. For example, fn:dayTimeDuration('PT12H') * 4
returns the duration
two days.
The operators +
and -
perform addition and subtraction of numbers,
in the usual way. If the operands are of different types, one of them is promoted, and the result
is the same type as the operands after promotion. For example, adding two integers produces
an integer; adding an integer to a double produces a double.
Note that the -
operator may need to be preceded by a space to prevent it being
parsed as part of the preceding name.
XPath 2.0 also allows these operators to be used for adding durations to dates and times, but this is not yet implemented in Saxon. However, Saxon 7.4 does allow durations to be added to (or subtracted from) durations.
The expression E1 to E2
returns a sequence of integers. For example, 1 to 5
returns the sequence 1, 2, 3, 4, 5
. This is useful in for
expressions, for example
the first five nodes of a node sequence can be processed by writing for $i in 1 to 5 return (//x)[$i]
.
The simplest comparison operators are eq
, ne
, lt
le
, gt
, ge
. These compare two atomic values of the same type,
for example two integers, two dates, or two strings. In the case of strings, the default collation
is used (see saxon:collation). If the operands are
not atomic values, an error is raised.
The operators =
, !=
, <
, <=
,
>
, and >=
can compare arbitrary sequences. The result is true
if any pair of items from the two sequences has the specified relationship, for example
$A = $B
is true if there is an item in $A
that is equal to
some item in $B
. If an argument is a node, Saxon currently uses its string
value in the comparison, not its typed value as required by the XPath 2.0 specification.
Saxon 7.4 implements the stricter rules of XPath 2.0 for type-checking the operands of a comparison. Comparing a string to an integer is now an error: one of the values must be explicitly cast to the type of the other. This is true even in backwards compatibility mode. However, if one of the values is an untyped node, its value will be converted to the type of the other operand; if both values are untyped, they will be compared as strings.
The operators is
and isnot
test whether the operands represent the same
(identical) node. For example, title[1] is *[@note][1]
is true if the first title
child is the first child element that has a @note
attribute. If either operand is an
empty sequence the result is an empty sequence (which will usually be treated as false).
The operators <<
and >>
test whether one node precedes
or follows another in document order.
The expression E instance of T
tests whether the value of expression E
is an instance of type T, or of a subtype of T. For example, $p instance of attribute+
is
true if the value of $p
is a sequence of one or more attribute nodes. It returns false if the
sequence is empty or if it contains an item that is not an attribute node. The detailed rules for
defining types, and for matching values against a type, are given in the XPath 2.0 specification.
Saxon 7.3 implements only a subset of this syntax. It allows testing of a value against any built-in
simple type defined in XML Schema, except that some of the types are not yet implemented: see
conformance.html. The type can also be a node-kind such as
element
, attribute
, etc; or it can be one of the keywords item
or node
. The type can be optionally followed by the occurrence indicator *
,
+
, or ?
.
Saxon also allows testing of the type annotation of an element or attribute node using tests of the
form element of type T
, attribute of type T
. This is of limited value at this
release, however, since the only way a node can acquire a type annotation is (a) if the node is
part of a temporary tree created within the stylesheet itself, or (b) if the node is an attribute with
a DTD-based type, for example ID.
The expression E castable as T
tests whether the expression cast as T (E)
would succeed. It is useful, for example, for testing whether a string contains a valid date before attempting
to cast it to a date. This is because XPath and XSLT currently provide no way of trapping the error if
the cast is attempted and fails.
XPath 2.0 allows a conditional expression of the form if ( E1 ) then E2 else E3
.
For example, if (@discount) then @discount else 0
returns the value of the discount
attribute if it is present, or zero otherwise.
The expression some $x in E1 satisfies E2
returns true if there is an item in the
sequence E1
for which the effective boolean value of E2
is true.
Note that E2
must use the range variable $x
to refer to the item being
tested; it does not become the context item. For example, some $x in @* satisfies $x eq ""
is true if the context item is an element that has at least one zero-length attribute value.
Similarly, the expression every $x in E1 satisfies E2
returns true if every item in the
sequence given by E1
satisfies the condition.
The expression for $x in E1 return E2
returns the sequence that result from evaluating
E2
once for every item in the sequence E1
. Note that E2
must
use the range variable $x
to refer to the item being
tested; it does not become the context item. For example, sum(for $v in order-item return
$v/price * $v/quantity)
returns the total value of (price times quantity) for all the
selected order-item
elements.
The expression E1 and E2
returns true if the effective boolean values of
E1
and E2
are both true.
The expression E1 or E2
returns true if the effective boolean values of
either or both of E1
and E2
are true.
The expression E1 , E2
returns the sequence obtained by concatenating the sequences
E1
and E2
.
For example, $x = ("London", "Paris", "Tokyo")
returns true if the value of $x
is one of the strings listed.
Saxon 7.4 does not allow this operator to appear at the top level: the comma operator may only appear inside a parenthesized expression.
Michael H. Kay
14 February 2003