XPath 2.0 Expression Syntax

Contents
Introduction Constants Variable References Function Calls Axis Steps Parentheses and operator precedence Filter Expressions Path expressions Cast as, Treat as Set difference and intersection Set union	Unary minus Multiplication and Division Addition and Subtraction Range Expressions Comparisons Instance of, Castable as Conditional Expressions Quantified Expressions For Expressions And Expressions Or Expressions Sequence Expressions

Introduction

This document is an informal guide to the syntax of XPath 2.0 expressions, which are used in Saxon both within XSLT stylesheets, and in the Java API. For formal specifications, see the XPath 2.0 specification, except where differences are noted here.

XPath expressions may be used either in an XSL stylesheet, or as a parameter to various Java interfaces. The syntax is the same in both cases. In the Java interface, expressions are handled using the net.sf.saxon.xpath.XPathEvaluator class, and are parsed using a call such as XPathEvaluator.createExpression("$a + $b"). This returns an object of class net.sf.saxon.xpath.XPathExpression, which provides two methods for evaluating the expression: evaluate(), which returns the value of the expression, and iterator(), which allows iteration over the items in the sequence returned by the expression. For further details of these methods, see the API documentation.

An important change in XPath 2.0 is that all values are now considered as sequences. A sequence consists of zero or more items; an item may be a node or a simple-value. Examples of simple-values are integers, strings, booleans, and dates. A single value such as a number is considered as a sequence of length 1. The empty sequence is written as (); a singleton sequence may be written as "a" or ("a"), and a general sequence is written as ("a", "b", "c").

The node-sets of XPath 1.0 are replaced in XPath 2.0 by sequences of nodes. Path expressions will return node sequences whose nodes are in document order with no duplicates, but other kinds of expression may return sequences of nodes in any order, with duplicates permitted.

This page summarizes the syntactic constructs and operators provided in XPath 2.0. The functions provided in the function library are listed separately: see functions.html.

Constants

String literals are written as "London" or 'Paris'. In each case you can use the opposite kind of quotation mark within the string: 'He said "Boo"', or "That's rubbish". In a stylesheet XSL expressions always appear within XML attributes, so it is usual to use one kind of delimiter for the attribute and the other kind for the literal. Anything else can be written using XML character entities. In XPath 2.0, string delimiters can be doubled within the string to represent the delimiter itself: for example <xsl:value-of select='"He said, ""Go!"""'/>

Numeric constants follow the Java rules for decimal literals: for example, 12 or 3.05; a negative number can be written as (say) -93.7, though technically the minus sign is not part of the literal. (Also, note that you may need a space before the minus sign to avoid it being treated as a hyphen within a preceding name). The numeric literal is taken as a double precision floating point number if it uses scientific notation (e.g. 1.0e7), as fixed point decimal if it includes a full stop, or as a integer otherwise. Decimal values in Saxon have unlimited precision, integers are limited to 64 bits. Note that a value such as 3.5 was handled as a double-precision floating point number in XPath 1.0, but as a decimal number in XPath 2.0: this may affect the precision of arithmetic results. Saxon implements decimal arithmetic using the Java class java.math.BigDecimal

There are no boolean constants as such: instead use the function calls true() and false().

Constants of other data types can be written using constructors, which look like function calls but require a string literal as their argument. For example, xs:float("10.7") produces a single-precision floating point number. Saxon implements constructors for many of the built-in data types defined in XML Schema Part 2: for a full list see conformance.html.

An example for date and dateTime values: you can write constants for these data types as xs:date("2002-04-30") or xs:dateTime("1966-07-31T15:00:00Z").

The latest (November 2002) draft of XPath 2.0 allows the argument to a constructor to contain whitespace, as determined by the whitespace facet for the target data type. This feature is implemented in Saxon 7.4

Variable References

The value of a variable (local or global variable, local or global parameter) may be referred to using the construct $name, where name is the variable name.

The variable is always evaluated at the textual place where the expression containing it appears; for example a variable used within an xsl:attribute-set must be in scope at the point where the attribute-set is defined, not the point where it is used.

A variable may take a value of any data type, and in general it is not possible to determine its data type statically.

It is an error to refer to a variable that has not been declared.

Starting with XPath 2.0, variables (known as range variables) may be declared within an XPath expression, not only using xsl:variable elements in the stylesheet. The expressions that declare variables are the for, some, and every expressions.

Saxon 7.4 does not allow two range variables within an expression to have the same name.

Function Calls

A function call in XPath 2.0 takes the form F ( arg1, arg2, ...) . In general, the function name is a QName. A library of core functions is defined in the XPath 2.0 and XSLT 2.0 specifications. For details of these functions, including notes on their implementation in this Saxon release, see functions.html. Additional functions are available (in a special namespace) as Saxon extensions: these are listed in extensions.html. Further functions may be implemented by the user, either as XSLT stylesheet functions (see xsl:function), or as Java extension functions (see extensibility.html).

In Saxon 7.4, the core function library is in no namespace; the functions are referenced without using a namespace prefix.

Saxon 7.4 implements function calls using the XPath 2.0 function call rules. Essentially, this means that the supplied value is not implicitly cast to the required type unless (a) the supplied value is an untyped element or attribute node, or (b) backwards compatibility mode is set (by setting version="1.0" and the required type is string or number. In all other cases, casting must be done explicitly if required.

Axis steps

The basic primitive for accessing a source document is the axis step. Axis steps may be combined into path expressions using the path operators / and //, and they may be filtered using filter expressions in the same way as the result of any other expression.

An axis step has the basic form axis :: node-test, and selects nodes on a given axis that satisfy the node-test. The axes available are:
ancestor Selects ancestor nodes starting with the current node and ending with the document node

ancestor-or-self Selects the current node plus all ancestor nodes

attribute Selects all attributes of the current node (if it is an element)

child Selects the children of the current node, in documetn order

descendant Selects the children of the current node and their children, recursively (in document order)

descendant-or-self Selects the current node plus all descendant nodes

following Selects the nodes that follow the current node in document order, other than its descendants

following-sibling Selects all subsequent child nodes of the same parent node

parent Selects the parent of the current node

preceding Selects the nodes that precede the current node in document order, other than its ancestors

preceding-sibling Selects all preceding child nodes of the same parent node

self Selects the current node

When the child axis is used, child:: may be omitted, and when the attribute axis is used, attribute:: may be abbviated to @. The expression parent::node() may be shortened to ..

The expression . is no longer synonymous with self::node(), since it may now select items that are not nodes. If the context item is not a node, any use of a path expression will raise an error.

The node-test may be:

a node name
prefix:* to select nodes in a given namespace
*:localname to select nodes with a given local name, regardless of namespace
text() (to select text nodes)
node() (to select any node)
processing-instruction() (to select any processing instruction)
processing-instruction('literal') to select processing instructions with the given name (target)
comment() to select comment nodes

Saxon 7.4 allows the constructs @node(), @text(), etc, which were allowed in XPath 1.0 but are not allowed in the current XPath 2.0 draft.

Parentheses and operator precedence

In general an expression may be enclosed in parentheses without changing its meaning.

If parentheses are not used, operator precedence follows the sequence below, starting with the operators that bind most tightly. Within each group the operators are evaluated left-to-right

Operator	Meaning
[]	predicate
/, //	path operator
cast as, treat as	type conversion
except, intersect	set difference and intersection
\|, union	union operation on sets
unary -	unary minus
*, div, idiv, mod	multiply, divide, integer divide, modulo
+, -	plus, minus
to	range expression
=, !=, is, isnot, <, <=;, >, >=;, eq, ne, lt, le, gt, ge	comparisons
instance of, castable as	type tests
if	conditional expressions
some, every	quantified expressions
for	iteration (mapping) over a sequence
and	Boolean and
or	Boolean or
, (comma)	Sequence concatenation

The latest (November 2002) drafts of XPath 2.0 and XSLT 2.0 allow a, b, c as a top-level expression. This is not yet implemented in Saxon 7.4. Saxon allows the comma operator only within parentheses.

The various operators are described, in this order, in the sections that follow.

Filter expressions

The notation E[P] is used to select items from the sequence obtained by evaluating E. If the predicate P is numeric, the predicate selects an item if its position (counting from 1) is equal to P; otherwise, the effective boolean value of P determines whether an item is selected or not. The effective boolean value of a sequence is false if the sequence is empty, or if it contains a single item that is one of: the boolean value false, the zero-length string, or a numeric zero or NaN value. Otherwise, the effective boolean value is true.

In XPath 2.0, E may be any sequence, it is not restricted to a node sequence. Within the predicate, the expression . (dot) refers to the context item, that is, the item currently being tested. The XPath 1.0 concept of context node has thus been generalized, for example . can refer to a string or a number.

Generally the order of items in the result preserves the order of items in E. As a special case, however, if E is a step using a reverse axis (e.g. preceding-sibling), the position of nodes for the purpose of evaluating the predicate is in reverse document order, but the result of the filter expression is in forwards document order.

Path expressions

A path expression is a sequence of steps separated by the / or // operator. For example, ../@desc selects the desc attribute of the parent of the context node.

In XPath 2.0, path expressions have been generalized so that any expression can be used as an operand of /, (both on the left and the right), so long as its value is a sequence of nodes. For example, it is possible to use a union expression (in parentheses) or a call to the id() or key() functions. The right-hand operand is evaluated once for each node in the sequence that results from evaluating the left-hand operand, with that node as the context item. In the result of the path expression, nodes are sorted in document order, and duplicates are eliminated.

In practice, it only makes sense to use expressions on the right of / if they depend on the context item. It is legal to write $x/$y provided both $x and $y are sequences of nodes, but the result is exactly the same as writing ./$y.

Note that the expressions ./$X or $X/. can be used to remove duplicates from $X and sort the results into document order. The same effect can be achieved by writing $X|()

The operator // is an abbreviation for /descendant-or-self::node()/. An expression of the form /E is shorthand for root(.)/E, and the expression / on its own is shorthand for root(.).

Cast as, Treat as

The expression cast as T (E) converts the value of expression E to type T. Since T must currently be a built-in schema-defined simple type, the effect is exactly the same as using the constructor function T (E).

Saxon implements most of the conversions defined in the XPath 2.0 specifications, for the data types that it supports, but the details of how the conversions are performed may vary in detail. The specification is still evolving in this area.

The expression treat as T (E) is designed for environments that perform static type checking. Saxon doesn't do static type checking, so this expression has very little use, except to document an assertion that the expression E is of a particular type. A run-time failure will be reported if the value of E is not of type T; no attempt is made to convert the value to this type.

Set difference and intersection

These operators are new in XPath 2.0.

The expression E1 except E2 selects all nodes that are in E1 unless they are also in E2. Both expressions must return sequences of nodes. The results are returned in document order. For example, @* except @note returns all attributes except the note attribute.

The expression E1 intersect E2 selects all nodes that are in both E1 and E2. Both expressions must return sequences of nodes. The results are returned in document order. For example, preceding::fig intersect ancestor::chapter//fig returns all preceding fig elements within the current chapter.

Union

The | operator was available in XPath 1.0; the keyword union has been added in XPath 2.0 as a synonym, because it is familiar to SQL users.

The expression E1 union E2 selects all nodes that are in either E1 or E2 or both. Both expressions must return sequences of nodes. The results are returned in document order. For example, /book/(chapter | appendix)/sections returns all section elements within a chapter or appendix of the selected book element.

Unary minus

The unary minus operator changes the sign of a number. For example -1 is minus one, and -0e0 is the double value negative zero.

Multiplication and division

The operator * multiplies two numbers. If the operands are of different types, one of them is promoted to the type of the other (for example, an integer is promoted to a decimal, a decimal to a double). The result is the same type as the operands after promotion.

The operator div divides two numbers. Dividing two integers produces a double; in other cases the result is the same type as the operands, after promotion. In the case of decimal division, the precision is the sum of the precisions of the two operands, plus six.

The operator idiv performs integer division. For example, the result of 10 idiv 3 is 3.

The mod operator returns the modulus (or remainder) after division. See the XPath 2.0 specification for details of the way that negative numbers are handled.

The operators * and div may also be used to multiply or divide a duration by a number. For example, fn:dayTimeDuration('PT12H') * 4 returns the duration two days.

Addition and subtraction

The operators + and - perform addition and subtraction of numbers, in the usual way. If the operands are of different types, one of them is promoted, and the result is the same type as the operands after promotion. For example, adding two integers produces an integer; adding an integer to a double produces a double.

Note that the - operator may need to be preceded by a space to prevent it being parsed as part of the preceding name.

XPath 2.0 also allows these operators to be used for adding durations to dates and times, but this is not yet implemented in Saxon. However, Saxon 7.4 does allow durations to be added to (or subtracted from) durations.

Range expressions

The expression E1 to E2 returns a sequence of integers. For example, 1 to 5 returns the sequence 1, 2, 3, 4, 5. This is useful in for expressions, for example the first five nodes of a node sequence can be processed by writing for $i in 1 to 5 return (//x)[$i].

Comparisons

The simplest comparison operators are eq, ne, lt le, gt, ge. These compare two atomic values of the same type, for example two integers, two dates, or two strings. In the case of strings, the default collation is used (see saxon:collation). If the operands are not atomic values, an error is raised.

The operators =, !=, <, <=, >, and >= can compare arbitrary sequences. The result is true if any pair of items from the two sequences has the specified relationship, for example $A = $B is true if there is an item in $A that is equal to some item in $B. If an argument is a node, Saxon currently uses its string value in the comparison, not its typed value as required by the XPath 2.0 specification.

Saxon 7.4 implements the stricter rules of XPath 2.0 for type-checking the operands of a comparison. Comparing a string to an integer is now an error: one of the values must be explicitly cast to the type of the other. This is true even in backwards compatibility mode. However, if one of the values is an untyped node, its value will be converted to the type of the other operand; if both values are untyped, they will be compared as strings.

The operators is and isnot test whether the operands represent the same (identical) node. For example, title[1] is *[@note][1] is true if the first title child is the first child element that has a @note attribute. If either operand is an empty sequence the result is an empty sequence (which will usually be treated as false).

The operators << and >> test whether one node precedes or follows another in document order.

Instance of and Castable as

The expression E instance of T tests whether the value of expression E is an instance of type T, or of a subtype of T. For example, $p instance of attribute+ is true if the value of $p is a sequence of one or more attribute nodes. It returns false if the sequence is empty or if it contains an item that is not an attribute node. The detailed rules for defining types, and for matching values against a type, are given in the XPath 2.0 specification.

Saxon 7.3 implements only a subset of this syntax. It allows testing of a value against any built-in simple type defined in XML Schema, except that some of the types are not yet implemented: see conformance.html. The type can also be a node-kind such as element, attribute, etc; or it can be one of the keywords item or node. The type can be optionally followed by the occurrence indicator *, +, or ?.

Saxon also allows testing of the type annotation of an element or attribute node using tests of the form element of type T, attribute of type T. This is of limited value at this release, however, since the only way a node can acquire a type annotation is (a) if the node is part of a temporary tree created within the stylesheet itself, or (b) if the node is an attribute with a DTD-based type, for example ID.

The expression E castable as T tests whether the expression cast as T (E) would succeed. It is useful, for example, for testing whether a string contains a valid date before attempting to cast it to a date. This is because XPath and XSLT currently provide no way of trapping the error if the cast is attempted and fails.

Conditional Expressions

XPath 2.0 allows a conditional expression of the form if ( E1 ) then E2 else E3. For example, if (@discount) then @discount else 0 returns the value of the discount attribute if it is present, or zero otherwise.

Quantified Expressions

The expression some $x in E1 satisfies E2 returns true if there is an item in the sequence E1 for which the effective boolean value of E2 is true. Note that E2 must use the range variable $x to refer to the item being tested; it does not become the context item. For example, some $x in @* satisfies $x eq "" is true if the context item is an element that has at least one zero-length attribute value.

Similarly, the expression every $x in E1 satisfies E2 returns true if every item in the sequence given by E1 satisfies the condition.

For Expressions

The expression for $x in E1 return E2 returns the sequence that result from evaluating E2 once for every item in the sequence E1. Note that E2 must use the range variable $x to refer to the item being tested; it does not become the context item. For example, sum(for $v in order-item return $v/price * $v/quantity) returns the total value of (price times quantity) for all the selected order-item elements.

And expressions

The expression E1 and E2 returns true if the effective boolean values of E1 and E2 are both true.

Or expressions

The expression E1 or E2 returns true if the effective boolean values of either or both of E1 and E2 are true.

Sequence expressions

The expression E1 , E2 returns the sequence obtained by concatenating the sequences E1 and E2.

For example, $x = ("London", "Paris", "Tokyo") returns true if the value of $x is one of the strings listed.

Saxon 7.4 does not allow this operator to appear at the top level: the comma operator may only appear inside a parenthesized expression.

Michael H. Kay
14 February 2003

ancestor	Selects ancestor nodes starting with the current node and ending with the document node
ancestor-or-self	Selects the current node plus all ancestor nodes
attribute	Selects all attributes of the current node (if it is an element)
child	Selects the children of the current node, in documetn order
descendant	Selects the children of the current node and their children, recursively (in document order)
descendant-or-self	Selects the current node plus all descendant nodes
following	Selects the nodes that follow the current node in document order, other than its descendants
following-sibling	Selects all subsequent child nodes of the same parent node
parent	Selects the parent of the current node
preceding	Selects the nodes that precede the current node in document order, other than its ancestors
preceding-sibling	Selects all preceding child nodes of the same parent node
self	Selects the current node