This page describes the extensions and implementation-defined features provided with the Saxon product.
If you want to implement your own extensions, see extensibility.html.
These extensions have been provided because there are things that are difficult to achieve, or inefficient, using standard XSLT facilities alone. As always, it is best to stick to the standard if you possibly can: and most things are possible, even if it's not obvious at first sight.
Before using a Saxon extension, check whether there is an equivalent EXSLT extension available. EXSLT extensions are more likely to be portable across XSLT processors. |
Saxon also provides a set of extension elements providing access to SQL databases. These are described here.
EXSLT is an initiative to define a standardized set of extension functions and extension elements that can be used across different XSLT processors.
Saxon now supports the EXSLT modules Common, Math, Sets, DatesAndTimes, and Functions. The full list of EXSLT extension functions implemented is:
These have considerable overlap with extension function and elements that have previously been provided in the Saxon namespace. In most cases the Saxon versions of the functions remain available, for the time being, but the EXSLT versions are preferred.
There are some known restrictions:
An extension attribute is an extra attribute on an XSLT-defined element. Following the rules of XSLT, such attributes must be in a non-default namespace. For Saxon extension elements, the namespace must be the Saxon namespace URI "http://saxon.sf.net/"
For example, the saxon:assignable attribute can be set as follows:
<xsl:variable name="counter" saxon:assignable="yes"
xmlns:saxon="http://saxon.sf.net/">
The extension attributes supplied with the Saxon product are as follows:
saxon:allow-avt |
This attribute may be set on the xsl:call-template element.
If set to the value "yes", it causes the name attribute of xsl:call-template to be
interpreted as an attribute value template. This allows the selection of the called template
to be decided at run-time. Typical usage is:<xsl:call-template name="{$tname}" saxon:allow-avt="yes"> |
saxon:assignable |
This attribute may be set on the xsl:variable element. The permitted values
are "yes" and "no". If the variable is the subject of a
saxon:assign instruction, it must be set to the value "yes".
Setting this value to "yes" also ensures that the variable is actually evaluated, which is
useful if the select expression calls extension functions with side-effects;
without this, a variable that is never reference may never be evaluated. |
saxon:explain |
This attribute may be set on any instruction in the stylesheet, including a literal result element. The permitted values are "yes" and "no". If the value is "yes", then at compile time Saxon outputs (to the standard error output) an analysis of all XPath expressions appearing on attributes of that instruction. The analysis includes the static type of the expression, and a representation of the expression tree that results from Saxon's parsing and static optimization phases of processing. The tree is represented by indentation. For example, writing:
produces the output:
(Here 550 is the internal code allocated to |
saxon:memo-function |
This attribute may be set on the xsl:function element. The permitted values
are "yes" and "no". Specifying "yes" indicates that Saxon should remember the
results of calling the function in a cache, and if the function is called again
with the same arguments, the result is retrieved from the cache rather than being
recalculated. Don't use this if the function has side-effects (for example, if
it calls saxon:assign , or an extension function with side-effects).
Don't use it if the function accesses context information such as the context node
or position() or last() . And be careful if the function constructs and returns a
temporary tree: the effect will be that the function returns the same tree each
time, rather than a copy of the tree (this difference will only show up if you
compare the identity of nodes in the two trees). |
A number of additional attributes, or attribute values, are allowed on the xsl:output element, beyond those defined in the XSLT 2.0 specification.
The method
attribute of xsl:output
can take the standard values "xml",
"html", "xhtml", or "text", or a QName.
If a QName is specified, the local name may be:
org.xml.sax.ContentHandler
interface, or the
net.sf.saxon.event.Receiver
interface. If such a value is specified,
output is directed to a newly-created instance of the user-supplied class.
You can pass additional information to this class by means of extra user-defined attributes
on the xsl:output
element.The prefix of the QName must correspond to a valid namespace URI. It is recommended to use the Saxon URI "http://saxon.sf.net/", but this is not enforced.
When the output is XML or HTML with indent="yes", the saxon:indent-spaces attribute may be used to control the amount of indentation. The value must be an integer.
This attribute allows greater control over how non-ASCII characters will be represented on output.
With method="xml", two values are supported: "decimal" and "hex". These control whether numeric character references are output in decimal or hexadecimal when the character is not available in the selected encoding.
With HTML, the value may hold two strings, separated by a semicolon. The first string defines how non-ASCII characters within the character encoding will be represented, the values being "native", "entity", "decimal", or "hex". The second string defines how characters outside the encoding will be represented, the values being "entity", "decimal", or "hex". Here "native" means output the character as itself; "entity" means use a defined entity reference (such as "é") if known; "decimal" and "hex" refer to numeric character references. For example "entity;decimal" (the default) means that with encoding="iso-8859-1", characters in the range 160-255 will be represented using standard HTML entity references, while Unicode characters above 255 will be represented as decimal character references.
This attribute may take the values "yes" or "no": the default is "no". If set to "yes", a byte order mark (Unicode xFEFF) is output at the start of the output file. This option is available with all output methods and all encodings, though it is most useful when producing XML encoded in UTF-8. Under the rules of XML 1.0 Second Edition, XML parsers are required to accept a byte order mark at the start of the file; however, some parsers written before the Second Edition was published may reject it. In particular, the Crimson parser included as the default XML parser in JDK 1.4 rejects byte order marks.
The saxon:next-in-chain
attribute is used to direct the output to another stylesheet. The
value is the URL of a stylesheet that should be used to process the output stream. In this case
the output stream must always be pure XML, and attributes that control the format of the output
(e.g. method, cdata-section-elements, etc) will have no effect. The output of the second stylesheet
will be directed to the destination that would have been used for the first stylesheet if
no saxon:next-in-chain
attribute were not present.
The attribute saxon:require-well-formed
is available, with
values "yes" or "no". The default is "no". If the value is set to "yes", and a user-written
ContentHandler
is supplied to receive the results of the transformation, then Saxon will report an
error rather than sending a non-well-formed stream of SAX events to the ContentHandler
.
This attribute is useful when the output of the stylesheet is sent to a component (for example an XSL-FO
rendering engine) that is not designed to accept non-well-formed XML result trees.
Any number of
user-defined attributes may be defined on xsl:output
. These
attributes must have names in a non-null namespace, which must not be either the XSLT
or the Saxon namespace. These attributes are interpreted as attribute value templates.
The value of the attribute is inserted into the Properties object made available to
the Emitter handling the output; they will be ignored by the standard output methods,
but can supply arbitrary information to a user-defined output method. The name of the
property will be the expanded name of the attribute in JAXP format, for example
"{http://my-namespace/uri}local-name", and the value will be the value as given,
after evaluation as an attribute value template.
A Saxon extension function is invoked using a name such as saxon:localname()
.
The saxon
prefix (or whatever prefix you choose to use) must be associated with the
Saxon namespace URI http://saxon.sf.net/
.
For example, to invoke the saxon:evaluate()
function, write:
<xsl:variable name="expression"
select="concat('child::', $param, '[', $index, ']')"/>
..
<xsl:copy-of select="saxon:evaluate($expression)"
xmlns:saxon="http://saxon.sf.net/"/>
The extension functions supplied with the Saxon product are as follows:
dayTimeDuration-from-seconds(number) | This returns an instance of |
distinct(node-set-1, [stored-expression]) | This returns a node-set obtained by eliminating nodes in node-set-1 that have duplicate values for the supplied stored expression, evaluated as a string. A stored expression may be obtained as the result of calling the saxon:expression() function. If no stored expression is supplied, the default is expression('.'), that is, the string-value of the node. If several nodes produce the same string value, the one that is first in document order will be retained. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Example: <xsl:for-each select="saxon:distinct(surname, saxon:expression('substring(.,1,1)')"> will process the first surname starting with each letter of the alphabet in turn. Note: for the single-argument version, the EXSLT distinct() function should be used in preference, for portability reasons. |
eval(stored-expression) | This returns the result of evaluating the supplied stored expression. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated in the current context, that is, the context node is the current node, and the context position and context size are the same as the result of calling position() or last() respectively. The second and subsequent arguments to |
evaluate(string) | The supplied string must contain an XPath expression. The result of the function is the result of evaluating the XPath expression. This is useful where an expression needs to be constructed at run-time or passed to the stylesheet as a parameter, for example where the sort key is determined dynamically. The function |
expression(string) | The supplied string must contain an XPath expression. The result of the function is a stored
expression, which may be supplied as an argument to other extension functions such as
saxon:eval(), saxon:sum() and saxon:distinct(). The result of
the expression will usually depend on the current node. The context for the expression includes the namespaces in scope at this point in the
stylesheet. The expression may contain references to the nine variables The stored expression (if it is to be evaluated using For example, following |
get-pseudo-attribute(string) | This function parses the contents of a processing instruction whose content follows the conventional attribute="value" structure (as defined for the <?xsl-stylesheet?> processing instruction). The context node should be a processing instruction; the function returns the value of the pseudo-attribute named in the first argument if it is present, or an empty string otherwise. |
has-same-nodes(node-set-1, node-set-2) | This returns a boolean that is true if and only if node-set-1 and node-set-2 contain the same set of nodes. Note this is quite different from the "=" operator, which tests whether there is a pair of nodes with the same string-value. |
highest(node-set-1 [, stored-expression]) | This returns (as a node-set) the node from node-set-1 that has the highest value of the supplied stored expression, evaluated as a number. If the stored expression is omitted, the expression "number(.)" is evaluated: that is, the string value of the node, converted to a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is an empty node-set. If several nodes have the highest value, the result node-set contains the one that is first in document order. This differs from the EXSLT highest() function, which returns all the nodes that have the maximum value. Example: saxon:highest(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the node for which this has the highest value. |
leading(node-set-1, stored-expression) | This returns a node-set containing all those nodes from node-set-1 up to and excluding the first one (in document order) for which the stored-expression evaluates to false. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1 (taken in document order), and with the context size equal to the size of node-set-1. Example: saxon:leading(following-sibling::*, saxon:expression('self::para')) will return the <para> elements following the current node, stopping at the first element that is not a </;para> Note: this function is quite different from the EXSLT leading() function, though both fulfil a similar purpose. |
line-number() | This returns the line number of the context node in the source document within the entity that contains it. There are no arguments. If line numbers are not maintained for the current document, the function returns -1. (To ensure that line numbers are maintained, use the -l option on the command line) |
lowest(node-set-1 [, stored-expression]) | This returns (as a node-set) the node from node-set-1 that has the lowest value of the supplied stored expression, evaluated as a number. If the stored expression is omitted, the expression "number(.)" is evaluated: that is, the string value of the node, converted to a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is an empty node-set. If several nodes have the lowest value, the result node-set contains the one that is first in document order. This differs from the EXSLT lowest() function, which returns all the nodes that have the minimum value. Example: saxon:lowest(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the node for which this has the lowest value. |
max(node-set-1 , stored-expression) | This returns the maximimum value of a numeric expression resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is negative infinity. For the single-argument version of this function, use the XPath 2.0 max() function instead, for portability. Example: saxon:max(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the maximum amount. |
min(node-set-1 , stored-expression) | This returns the minimum value of a numeric expression resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is positive infinity. For the single-argument version of this function, use the XPath 2.0 min() function instead, for portability. Example: saxon:min(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the minimum amount. |
parse(string) | This function takes a single argument, a string containing the source text of a well-formed XML document. It returns the document node (root node) that results from parsing this text. It throws an error if the text is not well-formed XML. Applications should not rely on the identity of the returned document node (at present, if the function is called twice with the same arguments, it returns a new document node each time, but this may change in future). This function is useful where one XML document is embedded inside another using CDATA, or as an alternative way of passing an XML document as a parameter to a stylesheet. |
path() | This takes no arguments. It returns a string whose value is an XPath expression identifying the context node in the source tree. This can be useful for diagnostics, or to create an XPointer value, or when generating another stylesheet to process the same document. The resulting string can be used as input to the evaluate() function, provided that any namespace prefixes it uses are declared. |
serialize(node, format) | This function takes two arguments: the first is a node (generally a document or element node) to be serialized. The second is the name of an <xsl:output> element in the stylesheet. The second argument must be known at compile time (it will typically be supplied as a string literal.) The function serializes the specified document, or the subtree rooted at the specified element, according to the parameters specified in the named <xsl:output> element, and returns the serialized document as a string. This function is useful where the XSLT stylesheet wants to manipulate the serialized output, for example by embedding it as CDATA inside another XML document, or prefixing it with a DOCTYPE declaration, or inserting it into a non-XML output file. |
sum(node-set-1, stored-expression) | This returns the total resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. If the result is NaN for any node, the total will be NaN. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Example: saxon:sum(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the total amount. |
systemId() | This returns the system identifier (URI) of the entity in the source document that contains the context node. There are no arguments. |
tokenize(string-1, string-2?) | The first argument is converted to a string and is treated as a list of separated tokens.
If the second argument is present, any character in string-2 is taken as a delimiter character,
and any sequence of delimiter characters is taken as a token separator. If the second argument
is omitted, any sequence of whitespace is taken as a token separator: or to put it another way,
the default for string-2 is '	

 '. A new sequence is constructed containing one string for each token; if the string is empty or contains a separator only then the result will be empty. For example tokenize("a cup of tea") generates a sequence of four strings: ( "a", "cup", "of", "tea"). |
yearMonthDuration-from-months(integer) | This returns an instance of |
The source code of these methods, which in most cases is extremely simple,
can be used as an example for writing
other user extension functions. It is found in class
net.sf.saxon.functions.Extensions
.
A Saxon extension instruction is invoked using a name such as <saxon:localname>
.
The saxon
prefix (or whatever prefix you choose to use) must be associated with the
Saxon namespace URI http://saxon.sf.net/
. The prefix must also be designated as an
extension element prefix by including it in the
extension-element-prefixes
attribute on the xsl:stylesheet
element, or the
xsl:extension-element-prefixes
attribute on any enclosing literal result element or
extension element.
However, top-level elements such as saxon:collation
and
saxon:script
can be used without
designating the prefix as an extension element prefix.
The saxon:assign
element is used to change the value of a local or global variable that
has previously been declared using xsl:variable
(or xsl:param
).
The variable or parameter
must be marked as assignable by including the extra attribute
saxon:assignable="yes"
As with xsl:variable
, the name of the variable is given in the mandatory name attribute,
and the new value may be given either by an expression in the select
attribute, or by expanding
the content of the xsl:assign
element.
If the xsl:variable
element has a type
attribute, then the value is converted
to the required type of the variable in the usual way.
Example:
|
Note: Using saxon:assign
is cheating. XSLT is designed as
a language that is free of side-effects, which is why variables are not assignable.
Once assignment to variables is allowed, certain optimizations become impossible.
At present this doesn't affect Saxon, which generally executes the stylesheet
sequentially. However, there are some circumstances in which the order of execution
may not be quite what you expect, in which case saxon:assign
may show
anomalous behavior. In principle the saxon:assignable
attribute is designed
to stop Saxon doing optimizations that cause such anomalies, but you can't always rely
on this.
It is also possible to specify a collation directly by using a URI of
the form http://saxon.sf.net/collation?keyword=value;keyword=value;...
.
For details see Collation URIs.
The saxon:collation
element is a top-level element used to define
collating sequences that may be used in sort keys and in functions such as
compare()
. The collation name is a URI (though actually
any string can be used), and
is defined in the mandatory name
attribute. The other attributes
control how the collation is defined. There are three ways of setting up a collation:
class
attribute is used to specify
the fully qualified name of a Java class that implements the
java.util.Comparator
interface. Note that if the collation is to be used
in functions such as contains() and starts-with(), this class must also be a
java.text.RuleBasedCollator
. This approach allows a user-defined collation
to be implemented in Java.rules
attribute is used to specify
details of the ordering required, using the syntax of the Java RuleBasedCollator. To give a
simplified example, rules="A < B < C"
lang
attribute follows the rules of the xml:lang
attribute,
for example specify
"en-US" for US English. This is used to find the collation appropriate to a Java locale.
The strength attribute sets the strength of the collator. Values are "primary", "secondary",
"tertiary", and "identical".
The decomposition attribute determines how the collator handles Unicode composed characters.
Values are "none", "standard", and "full".
See the JDK documentation for full details of these attributes.Sorting and comparison according to Unicode codepoints can be achieved by setting up a collator as
<saxon:collation name="unicode" class="net.sf.saxon.sort.CodepointCollator"/>
Note that a stylesheet containing a saxon:collation
declaration cannot be
compiled at this release, because the underlying Java classes are not serializable.
The saxon:doctype
instruction is used to insert a document type declaration into the
current output file. It should be instantiated before the first element in the output file is written.
The saxon:doctype
instruction takes no attributes. The content of the element is a template-body
that is instantiated to create an XML document that represents the DTD to be generated; this XML
document is then serialized using a special output method that produces DTD syntax rather than
XML syntax.
If this element is present the doctype-system and doctype-public attributes of xsl:output
should not
be present.
The generated XML document uses the following elements, where the namespace prefix "dtd" is used for the namespace URI "http://saxon.sf.net/dtd":
dtd:doctype | Represents the document type declaration. This is always the top-level element. The element
may contain dtd:element, dtd:attlist, dtd:entity, and dtd:notation elements. It may have the following
attributes: name (mandatory) The name of the document type system The system ID public The public ID |
dtd:element | Represents an element type declaration. This is always a child of dtd:doctype. The element
is always empty. It may have the following attributes: name (mandatory) The name of the element type content (mandatory) The content model, exactly as it appears in a DTD, for example content="(#PCDATA)" or content="( a | b | c)*" |
dtd:attlist | Represents an attribute list declaration. This is always a child of dtd:doctype. The element
will generally have one or more dtd:attribute children. It may have the following attributes: element (mandatory) The name of the element type |
dtd:attribute | Represents an attribute declaration within an attribute list.
This is always a child of dtd:attlist. The element
will always be empty. It may have the following attributes: name (mandatory) The name of the attribute type (mandatory) The type of the attribute, exactly as it appears in a DTD, for example type="ID" or type="( red | green | blue)" value (mandatory) The default value of the attribute, exactly as it appears in a DTD, for example value="#REQUIRED" or value="#FIXED 'blue'" |
dtd:entity | Represents an entity declaration.
This is always a child of dtd:doctype. The element may be empty, or it may have content. The
content is a template body, which is instantiated to define the value of an internal parsed
entity. Note that this value includes the delimiting quotes. The xsl:entity element
may have the following attributes: name (mandatory) The name of the entity system The system identifier public The public identifier parameter Set to "yes" for a parameter entity notation The name of a notation, for an unparsed entity |
dtd:notation | Represents a notation declaration.
This is always a child of dtd:doctype. The element will always be empty. It
may have the following attributes: name (mandatory) The name of the notation system The system identifier public The public identifier |
Note that Saxon will perform only minimal validation on the DTD being generated; it will output the components requested but will not check that this generates well-formed XML, let alone that the output document instance is valid according to this DTD.
Example:
|
Although not shown in this example, there is nothing to stop the DTD being generated as the
output of a transformation, using instructions such as xsl:value-of
and xsl:call-template
.
It is also possible to use xsl:text
with disable-output-escaping="yes"
to output DTD constructs not
covered by this syntax, for example conditional sections and references to parameter entities.
The saxon:entity-ref element is useful to generate entities such as in HTML output. To do this, write:
|
Note: the preferred way to produce a non-breaking space character in the output is
simply to write  
or  
in the stylesheet. By
default, with HTML output, this will be serialized as
, though the
way it is serialized doesn't actually matter as far as the HTML browser is concerned.
The saxon:script
element is a top-level element.
It is used to define an implementation
for an extension function that will be used by Saxon. With
other processors, a different implementation of the same function can be selected,
using mechanisms defined by that processor (for example, xalan:script
).
The attributes for saxon:script
are the same as the attributes
of the xsl:script
element defined in the (now withdrawn) XSLT 1.1
working draft.
The language
attribute is mandatory, and must take the value "java".
The values "javascript", "ecmascript", or a QName are also permitted, but in this
case Saxon ignores the saxon:script
element.
The implements-prefix
attribute is mandatory, its value must be a namespace
prefix that maps to the same namespace URI as the prefix used in the extension function
call.
The src
attribute is mandatory for language="java", its value must take the
form "java:fully.qualified.class.Name", for example "java:java.util.Date". It defines
the class containing the implementation of extension functions that use this prefix.
The archive
attribute is optional, its value is a space-separated list of URLs
of folders or JAR files that will be searched to find the named class. If the attribute
is omitted, the class is sought on the classpath.
The saxon:while
element is used to iterate while some condition is true.
The condition is given as a boolean expression in the mandatory test attribute. Because
this expression must change its value if the loop is to terminate, the condition will generally
reference a variable that is updated somewhere in the loop using an saxon:assign
element.
Alternatively, it may test a condition that is changed by means of a call on an extension function
that has side-effects.
Example:
|
Michael H. Kay
2 May 2003