XPath Expression Syntax

Contents

Introduction
Constants
Variable References
Parentheses and operator precedence
String Expressions
Boolean Expressions
Numeric Expressions
NodeSet expressions

Introduction

This document is an informal guide to the syntax of XPath expressions, which are used in SAXON both within XSLT stylesheets, and in the Java API. For formal specifications, see the XSLT and XPath standards, except where differences are noted here.

We can classify expressions according to the data type of their result: string, number, boolean, node-set, and document-fragment. These categories are examined in the following sections.

SAXON expressions may be used either in an XSL stylesheet, or as a parameter to various Java interfaces. The syntax is the same in both cases. In the Java interface, expressions are encapsulated by the com.icl.saxon.Expression class, and are parsed using a call such as Expression.make("$a + $b"). To exploit the full power of XPath expressions in the Java API, you will need to supply some support classes to perform functions such as resolving namespace references: this cannot be done automatically because there is no stylesheet to use as a refernce point.

Constants

String literals are written as "London" or 'Paris'. In each case you can use the opposite kind of quotation mark within the string: 'He said "Boo"', or "That's rubbish". In a stylesheet XSL expressions always appear within XML attributes, so it is usual to use one kind of delimiter for the attribute and the other kind for the literal. Anything else can be written using XML character entities.

Numeric constants follow the Java rules for decimal literals: for example, 12 or 3.05; a negative number can be written as (say) -93.7, though technically the minus sign is not part of the literal. (Also, note that you may need a space before the minus sign to avoid it being treated as a hyphen within a preceding name).

There are no boolean constants as such: instead use the function calls true() and false().

Variable References

The value of a variable (local or global variable, local or global parameter) may be referred to using the construct $name, where name is the variable name.

The variable is always evaluated at the textual place where the expression containing it appears; for example a variable used within an xsl:attribute-set must be in scope at the point where the attribute-set is defined, not the point where it is used.

A variable may take a value of any data type (string, number, boolean, node-set, or result-tree-fragment), and in general it is not possible to determine its data type statically.

It is an error to refer to a variable that has not been declared.

Parentheses and operator precedence

In general an expression may be enclosed in parentheses without changing its meaning. (There are places where parentheses cannot be used within a path-expression, however.)

If parentheses are not used, operator precedence follows the sequence below, starting with the operators that bind most tightly. Within each group the operators are evaluated left-to-right

Operator	Meaning
[]	predicate
/, //	child nodes, descendant nodes
\|	union
*, div, mod	multiply, divide, modulo
+, -	plus, minus
<, <=;, >, >=;	Less-than, less-or-equal, greater-than, greater-or-equal
=	equals
and	Boolean and
or	Boolean or

String Expressions

There are some constructs that are specifically string expressions, but in addition any other kind of expression can be used in a context where a string expression is required:

A numeric expression is converted to a string by giving its conventional decimal representation, for example the value -3.5 is displayed as "-3.5", and 2.0 is displayed as "2".
A boolean expression is displayed as one of the strings "true" or "false".
When a node-set expression is used in a string context, only the first node of the node-set (in document order) is used: the value of this node is converted to a string. The value of a text node is the character content of the node; the value of any other node is the concatenation of all its descendant text nodes.
A result tree fragment is technically converted to a string in the same way as a node-set; but since the corresponding node-set will always contain a single node, the effect is to generate all the descendant text nodes ignoring all element tags.

The specific string expressions are as follows:

Construct	Meaning
string(expression)	This performs an explicit type conversion to a string, which will always give the same result as the implicit conversion described above. The main case where explicit conversion is useful is when assigning a value to a variable.
concat(expression1, expression2 {,expression3}*)	This concatenates the string values of the arguments. There may be any number of arguments (two or more).
substring(expression1, expression2 [,expression3])	This extracts a substring of the string value of expression1. Expression2 gives the start position (starting at 1), expression 3 gives the length: if omitted, the rest of the string is used. For example, substring("Michael", 2, 4) is "icha".
substring-before(expression1 ,expression2)	This returns the substring of expression1 that precedes the first occurrence of expression2. If expression1 does not contain expression2, it returns the empty string. For example, substring-before("c:\dir", ":\") returns "c".
substring-after(expression1 ,expression2)	This returns the substring of expression1 that follows the first occurrence of expression2. If expression1 does not contain expression2, it returns the empty string. For example, substring-before("c:\dir", ":\") returns "dir".
normalize-space(expression1)	This removes leading and trailing white space, and converts all other sequences of white space to a single space character. For example, 'normalize(" Mike Kay ")' returns "Mike Kay"
translate(expression1, expression2, expression3)	This replaces any character in expression1 that also occurs in expression2 with the corresponding character from expression3. For example, translate ("ABBA", "ABC", "123") returns "1221". If there is no corresponding character in expression3 (because it is shorter than expression2), the character is removed from the string.
name(nodeset-expression)	Returns the name of the first node in the nodeset-expression, or the current node if the argument is omitted. The name here is the "display name"; it will use the same namespace prefix as in the original source document.
localpart(nodeset-expression)	Returns the local part (after the colon) of the name of the first node in the nodeset-expression, or the current node if the argument is omitted
namespace-uri(nodeset-expression)	Returns the URI of the namespace of the name of the first node in the nodeset-expression, or the current node if the argument is omitted
unparsed-entity-uri(string-expression)	Returns the URI of the unparsed entity with the given name in the current document, if there is one; otherwise the empty string
generate-id(nodeset-expression)	Returns a system-generated identifier for the first node in the nodeset-expression, or the current node if the argument is omitted. The generated identifiers are always alphanumeric (except for the document node, where the identifier is the empty string), and have three useful properties beyond those required by the XSLT specification: The alphabetic order of identifiers is the same as the document order of nodes If generate-id(A) is a leading substring of generate-id(B), then A is an ancestor node of B The identifier is unique not only within the document, but within all documents opened during the run.

Numeric Expressions

There are some constructs that are specifically numeric expressions, but in addition any string whose value is convertible to a number can be used as a number. (A string that does not represent any number is treated as zero).

A boolean is converted to a number by treating false as 0 and true as 1.

The specific numeric expressions are as follows:

Construct	Meaning
number(expression)	This performs an explicit type conversion to a number, which will always give the same result as the implicit conversion described above. Explicit conversion can be useful when assigning a value to a variable. It is also useful when creating an qualifier in a nodeset expression, since the meaning of a numeric qualifier is different from a boolean one.
count(node-set-expression)	This returns the number of nodes in the node-set.
sum(node-set-expression)	This converts the value of each node in the node-set to a number, and totals the result.
string-length(expression)	This returns the number of characters in the string value of expression. Characters are counted using the Java length() function, which does not necessarily give the same answer as the XPath rules, particularly when combining characters are used.
numeric-expression1 op numeric-expression2	This performs an arithmetic operation on the two values. The operators are + (plus), - (minus), * (multiply), div (divide), mod (modulo), and quo (quotient). Note that div does a floating-point division; quo returns the result of div truncated to an integer; and n mod m returns n - ((n quo m) * m).
- numeric-expression2	Unary minus: this subtracts the value from zero.
floor(numeric-expression1)	This returns the largest integer that is <= the argument
ceiling(numeric-expression1)	This returns the smallest integer that is >= the argument
round(numeric-expression1)	This returns the closest integer to the argument. The rounding rules follow Java conventions which are not quite the same as the XSL rules.
position()	This returns the position of the current node in the current node list. Positions are numbered from one.
last()	This returns the number of nodes in the current node list

Boolean Expressions

Expressions of other types are converted to booleans as follows:

Numeric values: 0 is treated as false, everything else as true.
String values: the zero-length string is treated as false, everything else as true.
Node-sets: the empty node set is treated as false, everything else as true.

The specific boolean expressions are as follows:

Construct	Meaning
boolean(expression)	This performs an explicit type conversion to a boolean, which will always give the same result as the implicit conversion described above. The main case where explicit conversion is useful is when assigning a value to a variable.
false(), true()	These function calls return false and true respectively.
not(boolean-expression1)	This returns the logical negation of the argument.
expression1 ( "=" \| "!=" ) expression2	This tests whether the two values are equal (or not-equal). An operand that is a result tree fragment is treated as if it were a node set containing a single node that acts as the root of the result tree fragment. If both operands are node sets, it tests whether there is a value in the first node set that is equal (or not equal) to some value in the second node-set, treating the values as strings. Note that if either or both node sets is empty, the result will be false (regardless of whether the operator is "=" or "!="). If one operand is a node set and the other is a string or number, it tests whether there is a value in the node set that is equal (or not equal) to the other operand. If the node set is empty, the result will be false. If one operand is a node set and the other is a boolean, it converts the nodeset to a boolean and compares the result. A nodeset that is empty is thus equal to false, while one that is non-empty is equal to true. Otherwise if one operand is a boolean, both operands are converted to boolean and compared. Otherwise if one operand is a number, both are converted to numbers and compared. Otherwise, they are both converted to strings and compared; two strings are equal if they contain exactly the same characters.
numeric-expression1 op numeric-expression2	This performs a numeric comparison of the two values. If both expressions are node sets, the result is true if there is a pair of values from the two node sets that satisfies the comparison. If one expression is a nodeset, the result is true if there is a value in that nodeset that satisfies the comparison with the other operand. The operators are < (less-than), <= (less-or-equal), > (greater-than), >= (greater-or-equal). The operators, when used in an XSL stylesheet, will need to be written using XML entities such as "<".
lang(string-expression)	This returns true if the xml:lang attribute on (or inherited by) the current node is equal to the argument, or if it contains a suffix starting with "-" and ending with the argument, ignoring case.

NodeSet Expressions

NodeSet expressions can be written as follows:

Construct

Meaning

nodeset-expression1 | nodeset-expression2

This forms the union of the two nodesets

nodeset-expression1 [ predicate ]

This returns the set of all nodes in nodeset-expression1 that satisfy the predicate. The predicate may be a boolean expression (which is evaluated with the particular node as current node, and the full node set as the current node set); or it may be a numeric expression, which is a shorthand for the boolean expression position()=predicate. The nodeset-expression may of course itself have one or more predicates, so a chain of filters can be set up.

nodeset-expression1 / relative-path

This follows the given path for each node in nodeset-expression1 (the "original nodes"), and returns all the nodes reached (the "target nodes"). The relative-path may be one of the following:

name - Select all the element children of the original nodes with the given element name
prefix:* - Select all the element children of the original nodes with the given namespace prefix
* - Select all the element children of the original nodes regardless of element name
@name - Select all the attributes of the original nodes with the given attribute name
@prefix:* - Select all the attributes of the original nodes with the given namespace prefix
@* - Select all the attributes of the original nodes regardless of attribute name
text() - Select all the text node children of the original nodes
.. - Select the parents of the original nodes
node() - Select all the children of the original nodes

axis-name :: node-test optional-predicates ) - a generalised construct for navigating in any direction. The axis-name may be any of the following:

ancestor	Selects ancestor nodes starting with the current node and ending with the document node
ancestor-or-self	Selects the current node plus all ancestor nodes
attribute	Selects all attributes of the current node (if it is an element)
child	Selects the children of the current node, in documetn order
descendant	Selects the children of the current node and their children, recursively (in document order)
descendant-or-self	Selects the current node plus all descendant nodes
following	Selects the nodes that follow the current node in document order, other than its descendants
following-sibling	Selects all subsequent child nodes of the same parent node
parent	Selects the parent of the current node
preceding	Selects the nodes that precede the current node in document order, other than its ancestors
preceding-sibling	Selects all preceding child nodes of the same parent node
self	Selects the current node

The node-test may be:

a node name
"prefix:*" to select nodes with a given namespace prefix
"text()" (to select text nodes)
"node()" (to select any node)
"processing-instruction()" (to select any processing instruction)
"processing-instruction('literal')" to select processing instructions with the given name (target)
comment()

The optional-predicates is a sequence of zero-or-more predicates, each enclosed in square brackets, each being either a boolean expression or a numeric expression (as a shorthand for testing position()).

nodeset-expression1 // relative-path

This is a shorthand for nodeset-expression1/descendant-or-self::node()/relative-path
In effect "//" selects descendants, where "/" selects immediate children: but where predicates are used, the expansion above defines the precise meaning.

This selects the current node

This selects the document root node. Note that this nodeset-expression cannot be followed by the "/" or "//" operator or by a predicate.

/ relative-path

This is a shorthand for "root()/relative-path" where root() is an imaginary designation of the document root node.

// relative-path

This is a shorthand for "root()//relative-path" where root() is an imaginary designation of the document root node.

document(expression1, expression2?)

The first string expression is a URL, or a nodeset containing a set of URLs; the function returns the nodeset consisting of the root nodes of the documents referenced (which must be XML documents). The optional second argument is node-set used to provide a base URL for resolving relative URLs: the default is the URL of the document containing the relative URL, which may be either a source document or a stylesheet document. Saxon allows the first argument to contain a fragment identifier, e.g. "my.xml#xyz", or simply "#xyz", in which case "xyz" must be the value of an ID attribute of an element within the referenced document. The effect is to retrieve a tree rooted at this element.

id(expression)

This returns the node, if any, that has an ID attribute equal to the given value,a nd which is in the same document as the current node. To use ID attributes, there must be a DTD that defines the attribute as being of type ID, and you must use a SAX parser that notifies ID attributes to the application. If the argument is a nodeset, the function returns the set of nodes that have an id attribute equal to a value held in any of the nodes in the nodeset-expression: each node in the nodeset expression is converted to a string and treated as a white-space-separated list of id values. If the argument is of any other type, its value is converted to a string and treated as a white-space-separated list of id values.

key(string-expression1, expression2)

The first string expression is a key name; the function returns the set of nodes in the current document that have a key with this name, with the key value given by the second expression. If this is a nodeset, the key values are the values of the nodes in the nodeset; othewise, the key value is the string value of the argument. Note that keys must be registered using the xsl:key element.

Some examples of NodeSet Expressions are listed below:

Expression	Meaning
XXX	Selects all immediate child elements with tag XXX
*	Selects all immediate child elements (but not character data within the element)
../TITLE	Selects the TITLE children of the parent element
XXX[@AAA]	Selects all XXX child elements having an attribute named AAA
*[last()]	Selects the last child of the current element
*/ZZZ	Selects all grandchild ZZZ elements
XXX[ZZZ]	Selects all child XXX elements that have a child ZZZ
XXX[@WIDTH and not(@WIDTH="20")]	Selects all child XXX elements that have a WIDTH attribute whose value is not "20"
/*	Selects the outermost element of the document
//TITLE	Selects all TITLE elements anywhere in the document
ancestor::SECTION	Selects the innermost containing SECTION element
ancestor::SECTION/@TITLE	Selects the TITLE attribute of the innermost containing SECTION element
./@*	Selects all attributes of the current element

Michael H. Kay
13 April 1999