Saxon Extensions

Saxon home page

Saxon Extensions

This page describes the extension functions and extension elements supplied with the Saxon product.

If you want to implement your own extensions, see extensibility.html.

These extension functions and elements have been provided because there are things that are difficult to achieve, or inefficient, using standard XSLT facilities alone. As always, it is best to stick to the standard if you possibly can: and most things are possible, even if it's not obvious at first sight.

Before using a Saxon extension, check whether there is an equivalent EXSLT extension available. EXSLT extensions are more likely to be portable across XSLT processors.

Contents

Extension attributes
saxon:allow-avt
saxon:assignable
saxon:memo-function
additional xsl:output attributes Extension functions
saxon:closure()
saxon:distinct()
saxon:evaluate()
saxon:expression()
saxon:getPseudoAttribute()
saxon:getUserData()
saxon:hasSameNodes()
saxon:highest()
saxon:ifNull()
saxon:leading()
saxon:lineNumber()
saxon:lowest()
saxon:max()
saxon:min()
saxon:parse()
saxon:path()
saxon:serialize()
saxon:setUserData()
saxon:string-to-unicode()
saxon:sum()
saxon:systemId()
saxon:tokenize()
saxon:unicode-to-string()
Extension instructions
saxon:assign
saxon:collation
saxon:doctype
saxon:entity-ref
saxon:preview
saxon:script
saxon:while

Saxon also provides a set of extension elements providing access to SQL databases. These are described here.

EXSLT

EXSLT is an initiative to define a standardized set of extension functions and extension elements that can be used across different XSLT processors.

Saxon now supports the EXSLT modules Common, Math, Sets, DatesAndTimes, and Functions. The full list of EXSLT extension functions implemented is:

common: node-set(), object-type()

math: abs(), acos(), asin(), atan(), atan2(), constant(), cos(), exp(), highest(), log(), lowest(), max(), min(), power(), random(), sin(), sqrt(), tan().

set: difference(), intersection(), distinct(), leading(), trailing(), has-same-node()

dates-and-times: date-time(), date(), time(), year(), leap-year(), month-in-year(), month-name(), month-abbreviation(), week-in-year(), week-in-month(), day-in-year(), day-in-month(), day-of-week-in-month(), day-in-week(), day-abbreviation(), hour-in-day(), minute-in-hour(), second-in-minute().

These have considerable overlap with extension function and elements that have previously been provided in the Saxon namespace. In most cases the Saxon versions of the functions remain available, for the time being, but the EXSLT versions are preferred.

Extension attributes

An extension attribute is an extra attribute on an XSL-defined element. Following the rules of XSLT, such attributes must be in a non-default namespace. For Saxon extension elements, the namespace must be the Saxon namespace URI "http://saxon.sf.net/"

For example, the saxon:assignable attribute can be set as follows:
<xsl:variable name="counter" saxon:assignable="yes" xmlns:saxon="http://saxon.sf.net/">

The extension attributes supplied with the Saxon product are as follows:

saxon:allow-avt This attribute may be set on the xsl:call-template element. If set to the value "yes", it causes the name attribute of xsl:call-template to be interpreted as an attribute value template. This allows the selection of the called template to be decided at run-time. Typical usage is:
<xsl:call-template name="{$tname}" saxon:allow-avt="yes">

saxon:assignable This attribute may be set on the xsl:variable element. The permitted values are "yes" and "no". If the variable is the subject of a saxon:assign instruction, it must be set to the value "yes".
saxon:memo-function This attribute may be set on the xsl:function element. The permitted values are "yes" and "no". Specifying "yes" indicates that Saxon should remember the results of calling the function in a cache, and if the function is called again with the same arguments, the result is retrieved from the cache rather than being recalculated. Don't use this if the function has side-effects (for example, if it calls saxon:assign, or an extension function with side-effects). Don't use it if the function accesses context information such as the context node or position() or last(). And be careful if the function constructs and returns a temporary tree: the effect will be that the function returns the same tree each time, rather than a copy of the tree (this difference will only show up if you compare the identity of nodes in the two trees).

Additional attributes for xsl:output

A number of additional attributes, or attribute values, are allowed on the xsl:output element, beyond those defined in the XSLT 2.0 specification.

Like the standard attributes of xsl:output, these are all interpreted as attribute value templates.

The method attribute

The method attribute of xsl:output and xsl:document can take the standard values "xml", "html", "xhtml", or "text", or a QName.

If a QName is specified, the local name may be:

the value "fop", which directs output to Apache's FOP processor (which must be installed separately from www.apache.org)

the fully-qualified class name of a class that implements either the SAX org.xml.sax.DocumentHandler interface, or the SAX2 org.xml.sax.ContentHandler interface, or that is a subclass of the net.sf.saxon.output.Emitter class. If such a value is specified, output is directed to a newly-created instance of the user-supplied class. You can pass additional information to this class by means of extra user-defined attributes on the xsl:output element.

The prefix of the QName must correspond to a valid namespace URI. It is recommended to use the Saxon URI "http://saxon.sf.net/", but this is not enforced.

Two additional attributes are available on the xsl:output and xsl:document elements, for use when method="saxon:fop". (Note, these are not fully tested).

fop:renderer specifies the name of a FOP Renderer class, for example fop:renderer="org.apache.fop.render.pdf.PDFRenderer".

fop:configuration specifies the name of a FOP user configuration file, for example fop:configuration="c:\config\fop.xml"

Here fop: is the prefix of a namespace whose URI must be http://saxon.sf.net/fop

The saxon:indent-spaces attribute

When the output is XML or HTML with indent="yes", the saxon:indent-spaces attribute may be used to control the amount of indentation. The value must be an integer.

The saxon:character-representation attribute

This attribute allows greater control over how non-ASCII characters will be represented on output.

With method="xml", two values are supported: "decimal" and "hex". These control whether numeric character references are output in decimal or hexadecimal when the character is not available in the selected encoding.

With HTML, the value may hold two strings, separated by a semicolon. The first string defines how non-ASCII characters within the character encoding will be represented, the values being "native", "entity", "decimal", or "hex". The second string defines how characters outside the encoding will be represented, the values being "entity", "decimal", or "hex". Here "native" means output the character as itself; "entity" means use a defined entity reference (such as "é") if known; "decimal" and "hex" refer to numeric character references. For example "entity;decimal" (the default) means that with encoding="iso-8859-1", characters in the range 160-255 will be represented using standard HTML entity references, while Unicode characters above 255 will be represented as decimal character references.

The saxon:omit-meta-tag attribute

This attribute may be set on the xsl:output element when method="html". The normal action of the HTML output method, as specified in the XSLT standard, is to generate a <META> tag immediately after the <HEAD> tag, containing details of the media type and character encoding. Setting this attribute to "yes" causes this output to be suppressed. Typical usage is

<xsl:output method="html" saxon:omit-meta-tag="yes">

The saxon:next-in-chain attribute

The saxon:next-in-chain attribute is used to direct the output to another stylesheet. The value is the URL of a stylesheet that should be used to process the output stream. In this case the output stream must always be pure XML, and attributes that control the format of the output (e.g. method, cdata-section-elements, etc) will have no effect. The output of the second stylesheet will be directed to the destination that would have been used for the first stylesheet if no saxon:next-in-chain attribute were present: for xsl:output, this means the original transformation result destination; for xsl:document, it means the file specified by the href attribute.

User defined attributes

Any number of user-defined attributes may be defined on both xsl:output and xsl:document. These attributes must have names in a non-null namespace, which must not be either the XSLT or the Saxon namespace. These attributes are interpreted as attribute value templates. The value of the attribute is inserted into the Properties object made available to the Emitter handling the output; they will be ignored by the standard output methods, but can supply arbitrary information to a user-defined output method. The name of the property will be the expanded name of the attribute in JAXP format, for example "{http://my-namespace/uri}local-name", and the value will be the value as given, after evaluation as an attribute value template.

Extension functions

A Saxon extension function is invoked using a name such as saxon:localname().

The saxon prefix (or whatever prefix you choose to use) must be associated with the Saxon namespace URI "http://saxon.sf.net/".

For example, to invoke the saxon:evaluate() function, write:

<xsl:variable name="expression" select="concat('child::', $param, '[', $index, ']')"/> .. <xsl:copy-of select="saxon:evaluate($expression)" xmlns:saxon="http://saxon.sf.net/"/>

The extension functions supplied with the Saxon product are as follows:

closure(node-set, expression) Not available at this release. This returns a node-set obtained as the transitive closure of applying the given expression to each node in the supplied node-set. For example, saxon:closure(., saxon:expression('*')) returns all the descendant elements of the context node, and saxon:closure(., saxon:expression(id(@idref))) returns all the elements that can be reached by following the @idref attribute treating it as the ID of another element. The function does not detect cycles: if cycles are present in the data, it will recurse indefinitely until it runs out of stack space. To allow expressions such as "*[@father=current()/@name]", each time the expression is evaluated the current node is set to be the same as the context node.

distinct(node-set-1, [stored-expression])
This returns a node-set obtained by eliminating nodes in node-set-1 that have duplicate values for the supplied stored expression, evaluated as a string. A stored expression may be obtained as the result of calling the saxon:expression() function. If no stored expression is supplied, the default is expression('.'), that is, the string-value of the node. If several nodes produce the same string value, the one that is first in document order will be retained.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1.

Example: <xsl:for-each select="saxon:distinct(surname, saxon:expression('substring(.,1,1)')"> will process the first surname starting with each letter of the alphabet in turn.

Note: for the single-argument version, the EXSLT distinct() function should be used in preference, for portability reasons.

eval(stored-expression)
This returns the result of evaluating the supplied stored expression. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated in the current context, that is, the context node is the current node, and the context position and context size are the same as the result of calling position() or last() respectively.

Example: saxon:eval(saxon:expression(concat(2, $op, 2)))

evaluate(string) The supplied string must contain an XPath expression. The result of the function is the result of evaluating the XPath expression. This is useful where an expression needs to be constructed at run-time or passed to the stylesheet as a parameter, for example where the sort key is determined dynamically. The context for the expression (e.g. which variables and namespaces are available) is exactly the same as if the expression were written explicitly at this point in the stylesheet. The function saxon:evaluate(string) is shorthand for saxon:eval(saxon:expression(string)).

expression(string) The supplied string must contain an XPath expression. The result of the function is a stored expression, which may be supplied as an argument to other extension functions such as saxon:eval(), saxon:sum() and saxon:distinct(). The result of the expression will usually depend on the current node. The expression may contain references to variables that are in scope at the point where saxon:expression() is called: these variables will be replaced in the stored expression with the values they take at the time saxon:expression() is called, not the values of the variables at the time the stored expression is evaluated. Similarly, if the expression contains namespace prefixes, these are interpreted in terms of the namespace declarations in scope at the point where the saxon:expression() function is called, not those in scope where the stored expression is evaluated.

get-pseudo-attribute(string) This function parses the contents of a processing instruction whose content follows the conventional attribute="value" structure (as defined for the <?xsl-stylesheet?> processing instruction). The context node should be a processing instruction; the function returns the value of the pseudo-attribute named in the first argument if it is present, or an empty string otherwise.

get-user-data(string) This returns user data associated with the context node in the source document. The user data must be set up previously using the saxon:setUserData() function.

has-same-nodes(node-set-1, node-set-2) This returns a boolean that is true if and only if node-set-1 and node-set-2 contain the same set of nodes. Note this is quite different from the "=" operator, which tests whether there is a pair of nodes with the same string-value.

highest(node-set-1 [, stored-expression])
This returns (as a node-set) the node from node-set-1 that has the highest value of the supplied stored expression, evaluated as a number. If the stored expression is omitted, the expression "number(.)" is evaluated: that is, the string value of the node, converted to a number. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is an empty node-set. If several nodes have the highest value, the result node-set contains the one that is first in document order. This differs from the EXSLT highest() function, which returns all the nodes that have the maximum value.

Example: saxon:highest(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the node for which this has the highest value.

if-null(java-object) The first argument must be a Java object wrapper returned from an external Java function. The function returns true if the wrapped Java object is null.

leading(node-set-1, stored-expression)
This returns a node-set containing all those nodes from node-set-1 up to and excluding the first one (in document order) for which the stored-expression evaluates to false. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1 (taken in document order), and with the context size equal to the size of node-set-1.

Example: saxon:leading(following-sibling::*, saxon:expression('self::para')) will return the <para> elements following the current node, stopping at the first element that is not a </;para>

Note: this function is quite different from the EXSLT leading() function, though both fulfil a similar purpose.

line-number() This returns the line number of the context node in the source document within the entity that contains it. There are no arguments. If line numbers are not maintained for the current document, the function returns -1. (To ensure that line numbers are maintained, use the -l option on the command line)

lowest(node-set-1 [, stored-expression])
This returns (as a node-set) the node from node-set-1 that has the lowest value of the supplied stored expression, evaluated as a number. If the stored expression is omitted, the expression "number(.)" is evaluated: that is, the string value of the node, converted to a number. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is an empty node-set. If several nodes have the lowest value, the result node-set contains the one that is first in document order. This differs from the EXSLT lowest() function, which returns all the nodes that have the minimum value.

Example: saxon:lowest(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the node for which this has the lowest value.

max(node-set-1 , stored-expression)
This returns the maximimum value of a numeric expression resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is negative infinity.

For the single-argument version of this function, use the XPath 2.0 max() function instead, for portability.

Example: saxon:max(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the maximum amount.

min(node-set-1 , stored-expression)
This returns the minimum value of a numeric expression resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is positive infinity.

For the single-argument version of this function, use the XPath 2.0 min() function instead, for portability.

Example: saxon:min(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the minimum amount.

parse(string)
This function takes a single argument, a string containing the source text of a well-formed XML document. It returns the document node (root node) that results from parsing this text. It throws an error if the text is not well-formed XML. Applications should not rely on the identity of the returned document node (at present, if the function is called twice with the same arguments, it returns a new document node each time, but this may change in future).

This function is useful where one XML document is embedded inside another using CDATA, or as an alternative way of passing an XML document as a parameter to a stylesheet.

path() This takes no arguments. It returns a string whose value is an XPath expression identifying the context node in the source tree. This can be useful for diagnostics, or to create an XPointer value, or when generating another stylesheet to process the same document. The resulting string can be used as input to the evaluate() function, provided that any namespace prefixes it uses are declared.

serialize(node, format)
This function takes two arguments: the first is a node (generally a document or element node) to be serialized. The second is the name of an <xsl:output> element in the stylesheet. The function serializes the specified document, or the subtree rooted at the specified element, according to the parameters specified in the named <xsl:output> element, and returns the serialized document as a string.

This function is useful where the XSLT stylesheet wants to manipulate the serialized output, for example by embedding it as CDATA inside another XML document, or prefixing it with a DOCTYPE declaration, or inserting it into a non-XML output file.

set-user-data(string, value)
This function sets user data associated with the context node in the source document. The data may be retrieved later (during the same stylesheet execution only) using the saxon:get-user-data() function. The string serves as a name for this property, allowing multiple pieces of user data to be associated with the same node. The value may be any XPath value. This function returns an empty string as its nominal result. Note: set-user-data() is particularly useful to save data read during preview mode processing (see saxon:preview) for later use during normal processing. However, take care (a) not to store the data with a node that will be deleted after the preview, and (b) not to store a node-set containing nodes that will be deleted after the preview. It is safest to store simple values such as strings and numbers: use the string() or number() function if necessary to do the conversion.

Like saxon:assign, this function breaks the XSLT no-side-effects rule. There is always a risk that the Saxon optimizer will execute expressions more than once, or not at all, or in a different order from that expected.

string-to-unicode(string)
This returns a sequence of integers representing the characters in the supplied string. Each integer is the Unicode numeric code-point value for one of the characters in the string. Note that a Unicode surrogate pair is considered as a single XML character.

The function is useful, for example, for testing whether a character is in a particular range. By turning a string into a sequence of characters, it also allows the use of sequence operations such as insert() and remove() on the characters in a string.

Example: saxon:string-to-unicode('PQR') returns the sequence (80, 81, 82).

sum(node-set-1, stored-expression)
This returns the total resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. If the result is NaN for any node, the total will be NaN. A stored expression may be obtained as the result of calling the saxon:expression() function.

The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1.

Example: saxon:sum(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the total amount.

systemId() This returns the system identifier (URI) of the entity in the source document that contains the context node. There are no arguments.

tokenize(string-1, string-2?) The first argument is converted to a string and is treated as a list of separated tokens. If the second argument is present, any character in string-2 is taken as a delimiter character, and any sequence of delimiter characters is taken as a token separator. If the second argument is omitted, any sequence of whitespace is taken as a token separator: or to put it another way, the default for string-2 is '	
 '.
A new sequence is constructed containing one string for each token; if the string is empty or contains a separator only then the result will be empty. For example tokenize("a cup of tea") generates a sequence of four strings: ( "a", "cup", "of", "tea").

unicode-to-string(integer*)
This takes a sequence of integers representing the characters of a string, and returns the resulting string. Each integer is the Unicode numeric code-point value for one of the characters in the string. Note that a Unicode surrogate pair is represented by a single integer in the sequence. Characters below 0x20, other than 0x9, 0xA, and 0xD, are not permitted.

Example: saxon:unicode-to-string((80, 81, 82)) returns 'PQR'. Note the need for double parentheses: one pair for the function call, another to delimit the sequence-valued argument.

The source code of these methods, which in most cases is extremely simple, can be used as an example for writing other user extension functions. It is found in class net.sf.saxon.functions.Extensions.

Extension instructions

A Saxon extension instruction is invoked using a name such as <saxon:localname>.

The saxon prefix (or whatever prefix you choose to use) must be associated with the Saxon namespace URI "http://saxon.sf.net/". The prefix must also be designated as an extension element prefix by including it in the extension-element-prefixes attribute on the xsl:stylesheet element, or the xsl:extension-element-prefixes attribute on any enclosing literal result element or extension element.

However, top-level elements such as saxon:collation and saxon:preview can be used without designating the prefix as an extension element prefix.

saxon:assign

The saxon:assign element is used to change the value of a local or global variable that has previously been declared using xsl:variable (or xsl:param). The variable or parameter must be marked as assignable by including the extra attribute saxon:assignable="yes"

As with xsl:variable, the name of the variable is given in the mandatory name attribute, and the new value may be given either by an expression in the select attribute, or by expanding the content of the xsl:assign element.

If the xsl:variable element has a type attribute, then the value is converted to the required type of the variable in the usual way.

Example:

<xsl:variable name="i" select="0" saxon:assignable="yes"/> <saxon:while test="$i < 10"> The value of i is <xsl:value-of select="$i"/> <saxon:assign name="i" select="$i+1"/> </saxon:while>

Note: Using saxon:assign is cheating. XSLT is designed as a language that is free of side-effects, which is why variables are not assignable. Once assignment to variables is allowed, certain optimizations become impossible. At present this doesn't affect Saxon, which generally executes the stylesheet sequentially. However, there are some circumstances in which the order of execution may not be quite what you expect, in which case saxon:assign may show anomalous behavior. In principle the saxon:assignable attribute is designed to stop Saxon doing optimizations that cause such anomalies, but you can't always rely on this.

saxon:collation

The saxon:collation element is a top-level element used to define collating sequences that may be used in sort keys and in the compare() function and in xsl:sort. The collation name is a URI (though actually any string can be used), and is defined in the mandatory name attribute. The other attributes are:

class: the fully qualified name of a Java class that implements the java.util.Comparator interface. (Note that when collations are supported in functions such as contains() and starts-with(), this class will have to be a java.text.RuleBasedCollator.)

lang: follows the rules of the xml:lang attribute, for example specify "en-US" for US English. This is used to find the collation appropriate to a Java locale.

strength: sets the strength of the collator. Values are "primary", "secondary", "tertiary", and "identical". See the JDK 1.2 documentation for details.

decomposition: Determines how the collator handles Unicode composed characters. Values are "none", "standard", and "full". See the JDK 1.2 documentation for details.

rules: Sets the rules to be used by a RuleBasedCollator. See the JDK 1.2 documentation for details.

default: Value is "yes" or "no". The value "yes" indicates that this collation is to be used as the default collation. If more than one collation is specified as the default, the last one wins. If no default collation is specified, Unicode codepoint collation is used.

Sorting and comparison according to Unicode codepoints can be achieved by setting up a collator as <saxon:collation name="unicode" class="net.sf.saxon.sort.CodepointCollator"/>

saxon:doctype

The saxon:doctype instruction is used to insert a document type declaration into the current output file. It must be instantiated before the first element in the output file is written.

The saxon:doctype instruction takes no attributes. The content of the element is a template-body that is instantiated to create an XML document that represents the DTD to be generated; this XML document is then serialized using a special output method that produces DTD syntax rather than XML syntax.

If this element is present the doctype-system and doctype-public attributes of xsl:output are ignored

The generated XML document uses the following elements, where the namespace prefix "dtd" is used for the namespace URI "http://saxon.sf.net/dtd":

dtd:doctype Represents the document type declaration. This is always the top-level element. The element may contain dtd:element, dtd:attlist, dtd:entity, and dtd:notation elements. It may have the following attributes:
name (mandatory) The name of the document type
system The system ID
public The public ID

dtd:element Represents an element type declaration. This is always a child of dtd:doctype. The element is always empty. It may have the following attributes:
name (mandatory) The name of the element type
content (mandatory) The content model, exactly as it appears in a DTD, for example content="(#PCDATA)" or content="( a | b | c)*"

dtd:attlist Represents an attribute list declaration. This is always a child of dtd:doctype. The element will generally have one or more dtd:attribute children. It may have the following attributes:
element (mandatory) The name of the element type

dtd:attribute Represents an attribute declaration within an attribute list. This is always a child of dtd:attlist. The element will always be empty. It may have the following attributes:
name (mandatory) The name of the attribute
type (mandatory) The type of the attribute, exactly as it appears in a DTD, for example type="ID" or type="( red | green | blue)"
value (mandatory) The default value of the attribute, exactly as it appears in a DTD, for example value="#REQUIRED" or value="#FIXED 'blue'"

dtd:entity Represents an entity declaration. This is always a child of dtd:doctype. The element may be empty, or it may have content. The content is a template body, which is instantiated to define the value of an internal parsed entity. Note that this value includes the delimiting quotes. The xsl:entity element may have the following attributes:
name (mandatory) The name of the entity
system The system identifier
public The public identifier
parameter Set to "yes" for a parameter entity
notation The name of a notation, for an unparsed entity

dtd:notation Represents a notation declaration. This is always a child of dtd:doctype. The element will always be empty. It may have the following attributes:
name (mandatory) The name of the notation
system The system identifier
public The public identifier

Note that Saxon will perform only minimal validation on the DTD being generated; it will output the components requested but will not check that this generates well-formed XML, let alone that the output document instance is valid according to this DTD.

Example:

<xsl:template match="/"> <saxon:doctype xsl:extension-element-prefixes="saxon"> <dtd:doctype name="booklist" xmlns:dtd="http://saxon.sf.net/dtd" xsl:exclude-result-prefixes="dtd"> <dtd:element name="booklist" content="(book)*"/> <dtd:element name="book" content="EMPTY"/> <dtd:attlist element="book"> <dtd:attribute name="isbn" type="ID" value="#REQUIRED"/> <dtd:attribute name="title" type="CDATA" value="#IMPLIED"/> </dtd:attlist> <dtd:entity name="blurb">'A <i>cool</i> book with > 200 pictures!'</dtd:entity> <dtd:entity name="cover" system="cover.gif" notation="GIF"/> <dtd:notation name="GIF" system="http://gif.org/"/> </dtd:doctype> </saxon:doctype> <xsl:apply-templates/> </xsl:template>

Although not shown in this example, there is nothing to stop the DTD being generated as the output of a transformation, using instructions such as xsl:value-of and xsl:call-template. It is also possible to use xsl:text with disable-output-escaping="yes" to output DTD constructs not covered by this syntax, for example conditional sections and references to parameter entities.

saxon:entity-ref

The saxon:entity-ref element is useful to generate entities such as   in HTML output. To do this, write:

<saxon:entity-ref name="nbsp"/>

Note: the preferred way to produce a non-breaking space character in the output is simply to write   or   in the stylesheet. By default, with HTML output, this will be serialized as  , though the way it is serialized doesn't actually matter as far as the HTML browser is concerned.

saxon:preview

The saxon:preview element is a top-level element used to identify elements in the source document that will be processed in preview mode. The purpose of preview mode is to enable XSLT processing of very large documents that are too big to fit in memory: the idea is that subtrees of the document can be processed and then discarded as soon as they are encountered.

There are two mandatory attributes: mode identifies the mode in which the relevant templates will be applied, and elements is a space-separated list of element names that will be processed in preview mode.

While the source XML document is being read, if an element end tag is encountered for an element that is in the list of preview elements, the relevant template is found (using the normal matching rules, with mode equal to the specified preview mode). This template is then executed. After the template has completed execution, the child nodes of the preview element (but not the element itself, nor its attributes) are deleted from the tree to save memory.

During the matching of a preview element and during the execution of the preview template, only part of the source document is visible. This part includes the ancestors of the preview element, the descendants of the preview element, and all nodes that precede the preview element in document order, except for nodes that are descendants of another preview element.

Global variables are not available to a preview template. The supplied values of global parameters are available, but not the default values of unsupplied parameters.

A preview template may write to a secondary output destination using saxon:output, or it may set global variables using saxon:assign. It can save information using the extension function setUserData(), which can be accessed later using getUserData(). This is useful to save information that would otherwise disappear when the subtree rooted at the preview element is deleted from the tree. The preview template may also write directly to the principal output destination. However, the output format for the principal output destination will be the default XML serialization, it cannot be controlled using <xsl:output>. Note also that in this case each instantiation of the preview template will produce a subtree immediately below the root of the output tree. Normally this means the output document will have multiple element nodes as children of the root. This is not well-formed XML, but you can easily construct a well-formed XML document by referencing this file as an external entity.

One simple use for saxon:preview is simply to delete unwanted parts of the tree to reduce the amount of memory needed. In this case, just provide a preview template that does nothing.

Preview templates are called while the tree is being built. When the tree has been completely built, it will contain the preview elements themselves, but any nodes that were descendants of the preview elements will have been deleted. At this stage the stylesheet is applied to the root of the tree, in "default" mode, in the normal way. If you don't want any further processing to take place at this stage, write a root template that does nothing: <xsl:template match="/"/>.

<saxon:preview> is not supported when a transformation is run using the JAXP 1.1 TransformerHandler interface. It works when using the Saxon command line, or when invoking a transformation using the transform() method.

saxon:script

The saxon:script element is a top-level element. It is used to define an implementation for an extension function that will be used by Saxon. With other processors, a different implementation of the same function can be selected, using mechanisms defined by that processor (for example, xalan:script).

The attributes for saxon:script are the same as the attributes of the xsl:script element defined in the (now withdrawn) XSLT 1.1 working draft.

The language attribute is mandatory, and must take the value "java". The values "javascript", "ecmascript", or a QName are also permitted, but in this case Saxon ignores the saxon:script element.

The implements-prefix attribute is mandatory, its value must be a namespace prefix that maps to the same namespace URI as the prefix used in the extension function call.

The src attribute is mandatory for language="java", its value must take the form "java:fully.qualified.class.Name", for example "java:java.util.Date". It defines the class containing the implementation of extension functions that use this prefix.

The archive attribute is optional, its value is a space-separated list of URLs of folders or JAR files that will be searched to find the named class. If the attribute is omitted, the class is sought on the classpath.

saxon:while

The saxon:while element is used to iterate while some condition is true.

The condition is given as a boolean expression in the mandatory test attribute. Because this expression must change its value if the loop is to terminate, the condition will generally reference a variable that is updated somewhere in the loop using an saxon:assign element. Alternatively, it may test a condition that is changed by means of a call on an extension function that has side-effects.

Example:

<xsl:variable name="i" expr="0"/> <saxon:while test="$i < 10"> The value of i is <xsl:value-of select="$i"/> <saxon:assign name="i" expr="$i+1"/> </saxon:while>

Michael H. Kay
27 August 2002

Contents
Extension attributes saxon:allow-avt saxon:assignable saxon:memo-function additional xsl:output attributes	Extension functions saxon:closure() saxon:distinct() saxon:evaluate() saxon:expression() saxon:getPseudoAttribute() saxon:getUserData() saxon:hasSameNodes() saxon:highest() saxon:ifNull() saxon:leading() saxon:lineNumber() saxon:lowest() saxon:max() saxon:min() saxon:parse() saxon:path() saxon:serialize() saxon:setUserData() saxon:string-to-unicode() saxon:sum() saxon:systemId() saxon:tokenize() saxon:unicode-to-string()	Extension instructions saxon:assign saxon:collation saxon:doctype saxon:entity-ref saxon:preview saxon:script saxon:while

saxon:allow-avt	This attribute may be set on the xsl:call-template element. If set to the value "yes", it causes the name attribute of xsl:call-template to be interpreted as an attribute value template. This allows the selection of the called template to be decided at run-time. Typical usage is: <xsl:call-template name="{$tname}" saxon:allow-avt="yes">
saxon:assignable	This attribute may be set on the xsl:variable element. The permitted values are "yes" and "no". If the variable is the subject of a saxon:assign instruction, it must be set to the value "yes".
saxon:memo-function	This attribute may be set on the xsl:function element. The permitted values are "yes" and "no". Specifying "yes" indicates that Saxon should remember the results of calling the function in a cache, and if the function is called again with the same arguments, the result is retrieved from the cache rather than being recalculated. Don't use this if the function has side-effects (for example, if it calls `saxon:assign`, or an extension function with side-effects). Don't use it if the function accesses context information such as the context node or position() or last(). And be careful if the function constructs and returns a temporary tree: the effect will be that the function returns the same tree each time, rather than a copy of the tree (this difference will only show up if you compare the identity of nodes in the two trees).

closure(node-set, expression)	Not available at this release. This returns a node-set obtained as the transitive closure of applying the given expression to each node in the supplied node-set. For example, saxon:closure(., saxon:expression('')) returns all the descendant elements of the context node, and saxon:closure(., saxon:expression(id(@idref))) returns all the elements that can be reached by following the @idref attribute treating it as the ID of another element. The function does not detect cycles: if cycles are present in the data, it will recurse indefinitely until it runs out of stack space. To allow expressions such as "[@father=current()/@name]", each time the expression is evaluated the current node is set to be the same as the context node.
distinct(node-set-1, [stored-expression])	This returns a node-set obtained by eliminating nodes in node-set-1 that have duplicate values for the supplied stored expression, evaluated as a string. A stored expression may be obtained as the result of calling the saxon:expression() function. If no stored expression is supplied, the default is expression('.'), that is, the string-value of the node. If several nodes produce the same string value, the one that is first in document order will be retained. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Example: <xsl:for-each select="saxon:distinct(surname, saxon:expression('substring(.,1,1)')"> will process the first surname starting with each letter of the alphabet in turn. Note: for the single-argument version, the EXSLT distinct() function should be used in preference, for portability reasons.
eval(stored-expression)	This returns the result of evaluating the supplied stored expression. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated in the current context, that is, the context node is the current node, and the context position and context size are the same as the result of calling position() or last() respectively. Example: saxon:eval(saxon:expression(concat(2, $op, 2)))
evaluate(string)	The supplied string must contain an XPath expression. The result of the function is the result of evaluating the XPath expression. This is useful where an expression needs to be constructed at run-time or passed to the stylesheet as a parameter, for example where the sort key is determined dynamically. The context for the expression (e.g. which variables and namespaces are available) is exactly the same as if the expression were written explicitly at this point in the stylesheet. The function saxon:evaluate(string) is shorthand for saxon:eval(saxon:expression(string)).
expression(string)	The supplied string must contain an XPath expression. The result of the function is a stored expression, which may be supplied as an argument to other extension functions such as saxon:eval(), saxon:sum() and saxon:distinct(). The result of the expression will usually depend on the current node. The expression may contain references to variables that are in scope at the point where saxon:expression() is called: these variables will be replaced in the stored expression with the values they take at the time saxon:expression() is called, not the values of the variables at the time the stored expression is evaluated. Similarly, if the expression contains namespace prefixes, these are interpreted in terms of the namespace declarations in scope at the point where the saxon:expression() function is called, not those in scope where the stored expression is evaluated.
get-pseudo-attribute(string)	This function parses the contents of a processing instruction whose content follows the conventional attribute="value" structure (as defined for the <?xsl-stylesheet?> processing instruction). The context node should be a processing instruction; the function returns the value of the pseudo-attribute named in the first argument if it is present, or an empty string otherwise.
get-user-data(string)	This returns user data associated with the context node in the source document. The user data must be set up previously using the saxon:setUserData() function.
has-same-nodes(node-set-1, node-set-2)	This returns a boolean that is true if and only if node-set-1 and node-set-2 contain the same set of nodes. Note this is quite different from the "=" operator, which tests whether there is a pair of nodes with the same string-value.
highest(node-set-1 [, stored-expression])	This returns (as a node-set) the node from node-set-1 that has the highest value of the supplied stored expression, evaluated as a number. If the stored expression is omitted, the expression "number(.)" is evaluated: that is, the string value of the node, converted to a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is an empty node-set. If several nodes have the highest value, the result node-set contains the one that is first in document order. This differs from the EXSLT highest() function, which returns all the nodes that have the maximum value. Example: saxon:highest(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the node for which this has the highest value.
if-null(java-object)	The first argument must be a Java object wrapper returned from an external Java function. The function returns true if the wrapped Java object is null.
leading(node-set-1, stored-expression)	This returns a node-set containing all those nodes from node-set-1 up to and excluding the first one (in document order) for which the stored-expression evaluates to false. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1 (taken in document order), and with the context size equal to the size of node-set-1. Example: *saxon:leading(following-sibling::, saxon:expression('self::para'))** will return the <para> elements following the current node, stopping at the first element that is not a </;para> Note: this function is quite different from the EXSLT leading() function, though both fulfil a similar purpose.
line-number()	This returns the line number of the context node in the source document within the entity that contains it. There are no arguments. If line numbers are not maintained for the current document, the function returns -1. (To ensure that line numbers are maintained, use the -l option on the command line)
lowest(node-set-1 [, stored-expression])	This returns (as a node-set) the node from node-set-1 that has the lowest value of the supplied stored expression, evaluated as a number. If the stored expression is omitted, the expression "number(.)" is evaluated: that is, the string value of the node, converted to a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is an empty node-set. If several nodes have the lowest value, the result node-set contains the one that is first in document order. This differs from the EXSLT lowest() function, which returns all the nodes that have the minimum value. Example: saxon:lowest(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the node for which this has the lowest value.
max(node-set-1 , stored-expression)	This returns the maximimum value of a numeric expression resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is negative infinity. For the single-argument version of this function, use the XPath 2.0 max() function instead, for portability. Example: saxon:max(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the maximum amount.
min(node-set-1 , stored-expression)	This returns the minimum value of a numeric expression resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Any NaN values are ignored. If the node-set is empty, the result is positive infinity. For the single-argument version of this function, use the XPath 2.0 min() function instead, for portability. Example: saxon:min(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the minimum amount.
parse(string)	This function takes a single argument, a string containing the source text of a well-formed XML document. It returns the document node (root node) that results from parsing this text. It throws an error if the text is not well-formed XML. Applications should not rely on the identity of the returned document node (at present, if the function is called twice with the same arguments, it returns a new document node each time, but this may change in future). This function is useful where one XML document is embedded inside another using CDATA, or as an alternative way of passing an XML document as a parameter to a stylesheet.
path()	This takes no arguments. It returns a string whose value is an XPath expression identifying the context node in the source tree. This can be useful for diagnostics, or to create an XPointer value, or when generating another stylesheet to process the same document. The resulting string can be used as input to the evaluate() function, provided that any namespace prefixes it uses are declared.
serialize(node, format)	This function takes two arguments: the first is a node (generally a document or element node) to be serialized. The second is the name of an <xsl:output> element in the stylesheet. The function serializes the specified document, or the subtree rooted at the specified element, according to the parameters specified in the named <xsl:output> element, and returns the serialized document as a string. This function is useful where the XSLT stylesheet wants to manipulate the serialized output, for example by embedding it as CDATA inside another XML document, or prefixing it with a DOCTYPE declaration, or inserting it into a non-XML output file.
set-user-data(string, value)	This function sets user data associated with the context node in the source document. The data may be retrieved later (during the same stylesheet execution only) using the saxon:get-user-data() function. The string serves as a name for this property, allowing multiple pieces of user data to be associated with the same node. The value may be any XPath value. This function returns an empty string as its nominal result. Note: set-user-data() is particularly useful to save data read during preview mode processing (see saxon:preview) for later use during normal processing. However, take care (a) not to store the data with a node that will be deleted after the preview, and (b) not to store a node-set containing nodes that will be deleted after the preview. It is safest to store simple values such as strings and numbers: use the string() or number() function if necessary to do the conversion. Like saxon:assign, this function breaks the XSLT no-side-effects rule. There is always a risk that the Saxon optimizer will execute expressions more than once, or not at all, or in a different order from that expected.
string-to-unicode(string)	This returns a sequence of integers representing the characters in the supplied string. Each integer is the Unicode numeric code-point value for one of the characters in the string. Note that a Unicode surrogate pair is considered as a single XML character. The function is useful, for example, for testing whether a character is in a particular range. By turning a string into a sequence of characters, it also allows the use of sequence operations such as insert() and remove() on the characters in a string. Example: saxon:string-to-unicode('PQR') returns the sequence (80, 81, 82).
sum(node-set-1, stored-expression)	This returns the total resulting from evaluating the supplied stored expression for each node in node-set-1 in turn, as a number. If the result is NaN for any node, the total will be NaN. A stored expression may be obtained as the result of calling the saxon:expression() function. The stored expression is evaluated for each node in node-set-1 in turn, with that node as the context node, with the context position equal to the position of that node in node-set-1, and with the context size equal to the size of node-set-1. Example: saxon:sum(sale, saxon:expression('@price * @qty')) will evaluate price times quantity for each child <sale> element, and return the total amount.
systemId()	This returns the system identifier (URI) of the entity in the source document that contains the context node. There are no arguments.
tokenize(string-1, string-2?)	The first argument is converted to a string and is treated as a list of separated tokens. If the second argument is present, any character in string-2 is taken as a delimiter character, and any sequence of delimiter characters is taken as a token separator. If the second argument is omitted, any sequence of whitespace is taken as a token separator: or to put it another way, the default for string-2 is ' '. A new sequence is constructed containing one string for each token; if the string is empty or contains a separator only then the result will be empty. For example tokenize("a cup of tea") generates a sequence of four strings: ( "a", "cup", "of", "tea").
*unicode-to-string(integer)**	This takes a sequence of integers representing the characters of a string, and returns the resulting string. Each integer is the Unicode numeric code-point value for one of the characters in the string. Note that a Unicode surrogate pair is represented by a single integer in the sequence. Characters below 0x20, other than 0x9, 0xA, and 0xD, are not permitted. Example: saxon:unicode-to-string((80, 81, 82)) returns 'PQR'. Note the need for double parentheses: one pair for the function call, another to delimit the sequence-valued argument.

dtd:doctype	Represents the document type declaration. This is always the top-level element. The element may contain dtd:element, dtd:attlist, dtd:entity, and dtd:notation elements. It may have the following attributes: name (mandatory) The name of the document type system The system ID public The public ID
dtd:element	Represents an element type declaration. This is always a child of dtd:doctype. The element is always empty. It may have the following attributes: name (mandatory) The name of the element type content (mandatory) The content model, exactly as it appears in a DTD, for example content="(#PCDATA)" or content="( a \| b \| c)*"
dtd:attlist	Represents an attribute list declaration. This is always a child of dtd:doctype. The element will generally have one or more dtd:attribute children. It may have the following attributes: element (mandatory) The name of the element type
dtd:attribute	Represents an attribute declaration within an attribute list. This is always a child of dtd:attlist. The element will always be empty. It may have the following attributes: name (mandatory) The name of the attribute type (mandatory) The type of the attribute, exactly as it appears in a DTD, for example type="ID" or type="( red \| green \| blue)" value (mandatory) The default value of the attribute, exactly as it appears in a DTD, for example value="#REQUIRED" or value="#FIXED 'blue'"
dtd:entity	Represents an entity declaration. This is always a child of dtd:doctype. The element may be empty, or it may have content. The content is a template body, which is instantiated to define the value of an internal parsed entity. Note that this value includes the delimiting quotes. The xsl:entity element may have the following attributes: name (mandatory) The name of the entity system The system identifier public The public identifier parameter Set to "yes" for a parameter entity notation The name of a notation, for an unparsed entity
dtd:notation	Represents a notation declaration. This is always a child of dtd:doctype. The element will always be empty. It may have the following attributes: name (mandatory) The name of the notation system The system identifier public The public identifier