Standards Conformance

Contents
XSLT 1.0 and XPath 1.0 Conformance XSLT 2.0 Conformance XPath 2.0 Conformance XQuery 1.0 Conformance Conformance with Other Specifications Conformance Documentation Character Encodings Supported Collation URIs XML 1.0 Conformance DOM Conformance JAXP 1.2 Conformance XSLT Error Recovery Policy

XSLT 1.0 and XPath 1.0 conformance

Saxon XSLT implements the XSLT 1.0 and XPath 1.0 Recommendations from the World Wide Web Consortium, found at http://www.w3.org/TR/xslt and http://www.w3.org/TR/xpath, which are referred to here collectively as "XSLT"

Saxon is 100% conformant to the mandatory requirements of these recommendations, except in some cases where incompatibilities have been introduced in the XSLT 2.0 and XPath 2.0 working drafts.

XSLT 2.0 conformance

This release of Saxon implements many facilities defined in the draft XSLT 2.0 specification. This implementation is not yet complete, though most features defined in the working draft of 12 November 2003 are available.

This version of Saxon acts as a Basic XSLT Processor (as distinct from a Schema-Aware XSLT Processor). This means it does not allow schemas to be imported and does not support validation of source or result documents or reference to user-defined types.

Known restrictions, relative to the conformance rules for a basic XSLT processor, include the following:

The new xsl:output option normalize-unicode is not implemented.
The xhtml output method does not follow all the rules for XHTML formatting in the draft XSLT 2.0 specification.
If key() is called in a match pattern, the argument must be a string literal.
The semantics of patterns when applied to nodes in non-document trees are not yet properly implemented.
The xsl:perform-sort element must currently have a select attribute and empty content (apart from the xsl:sort elements themselves).
When the source document is supplied as a pre-built tree (in any format), Saxon strips whitespace text nodes as requested by the stylesheet, but takes no account of any xml:space attributes present in the tree.
The ability for xsl:key and xsl:sort to contain a sequence constructor is not implemented

XPath 2.0 conformance

This release of Saxon implements the full XPath 2.0 grammar as defined in the working draft of 12 November 2003.

The restrictions in XPath 2.0 support include the following:

Saxon supports all the built-in data types with the exception of xs:NOTATION.
The data type xs:unsignedLong is constrained to fit in 63 bits.
Casting of a string to a QName is not supported, except when an explicit cast or constructor function is used in an XSLT environment.
The state of implementation of all the standard functions is described in functions.html
Support for the type xs:duration goes beyond what the specification allows. Ordering is implemented as a total order over all durations, based on the average length of a month (one year = 365.242199 days).

XQuery 1.0 Conformance

The restrictions noted above with respect to XPath 2.0 apply equally to Saxon's support for XQuery 1.0.

Additional restrictions in Saxon compared with the November 2003 draft of XQuery 1.0 include the following:

User-defined functions must have namespace-prefixed names.
The as clause in variable declarations (both global variables, and for and let variables) follow the function calling rules, rather than the stricter rules that apply to variables. That is, the supplied value is converted to the declared type in three ways: atomization of nodes, casting of untyped atomic, and numeric promotion.
The as clause in some and every expressions is not implemented.
Computed namespace constructors are not implemented.
The rules that determine which namespace nodes should be added to a constructed element are not fully implemented. (The main exception is that in some cases, constructed nodes inherit the namespace of their new parent, which is wrong according to the spec.)

Conformance with other specifications

Saxon is dependant on the user-selected XML parser to ensure conformance with the XML 1.0 Recommendation and the XML Namespaces Recommendation.

Saxon implements the <?xml-stylesheet?> processing instruction as described in the W3C Recommendation Associating StyleSheets with XML Documents. The href pseudo-attribute must be a URI identifying an XML document containing a stylesheet, or a URI with a fragment identifier identifying an embedded stylesheet. The fragment must be the value of an ID attribute declared as such in the DTD.

Saxon works with any SAX2-conformant XML parser that is configured to enable namespace processing. There is one limitation: on the startElement() call from the XMLReader to the ContentHandler, the QName (that is, the third argument) must be present. According to the SAX2 specification, namespace-aware parsers are not obliged to supply this argument. However, all commonly-used parsers appear to do so.

Saxon should work with any DOM-conformant XML parser, however, Saxon's DOM interface is tested only with Crimson and Xerces, and DOM implementations are known to vary widely.

When a DOM or JDOM tree is supplied as the transformation input, Saxon does not combine adjacent text nodes into a single node. Adjacent text nodes can occur as the result of user modifications to the tree, or as a result of the presence of CDATA sections or entity references, depending on the options in force when the tree was constructed.

When a DOM or JDOM tree is supplied as the transformation input, Saxon does not take the xml:space attribute into consideration when deciding whether or not to strip whitespace text nodes.

Conformance Documentation

The XSLT specification says that the documentation for an implementation should specify which URI schemes are supported. Saxon supports the URI scheme implemented by the Java java.net.URL class, with the optional addition of a fragment identifier, as described below. Additionally, Saxon allows the user to nominate a URIResolver class which can be used to implement any URI scheme the user wants.

The XSLT specification says that the documentation for an implementation should specify for which media types fragment identifiers are supported. The standard URI resolver supports access to XML documents only. A simple fragment identifier is allowed, consisting of the value of an ID attribute in the document. The effect is to return the subdocument rooted at the element with this identifier if there is one, or an empty document otherwise. For example, the URI mydoc.xml#aaa locates the XML document mydoc.xml, and if it contains an element <eeee id="aaa">, where id is an attribute of type ID, then the document retrieved is an XML document with this <eeee> element as its outermost (document) element.

The values of the vendor-specific system properties are:

xsl:version	2.0
xsl:vendor	SAXON n.n.n from Saxonica
xsl:vendor-url	http://saxon.sf.net/
xsl:product-name	SAXON
xsl:product-version	n.n.n

All these values are subject to change in future releases. Users wishing to test whether the processor is Saxon are advised to test whether the xsl:product system property has the value "SAXON".

Extensions

Saxon implements a number of extensions to standard XSLT, following the rules for extension functions and extension elements where appropriate. The extensions are documented in extensions.html. They are all implemented in accordance with the provisions in the standard for extensibility.

Character Encodings Supported

The encodings supported on input depend entirely on your choice of XML parser.

On output, any encoding supported by the Java VM may be used.

There are some differences between the character encodings supported by the old java.io package and the new java.nio package. If the requested encoding is not supported by the java.nio package, then all non-ASCII characters will be represented using numeric character references. If the encoding is not supported by the java.io package, then Saxon will revert to using UTF-8 as the actual output encoding. A list of the character encodings supported in the java.nio package can be obtained by using the command java net.sf.saxon.charcode.CharacterSetFactory, with no parameters.

Collation URIs

Collations used for comparing strings can be specified by means of a URI. A collation URI may be used as an argument to many of the standard functions, and also as an attribute of xsl:sort in XSLT, and in the order by clause of a FLWOR expression in XQuery.

The W3C specifications leave the details of collation URIs entirely implementation-defined. This section explains the collation URIs that can be used with Saxon.

In Saxon XSLT stylesheets, collations may be described using a saxon:collation element as a top-level declaration in the stylesheet. In this case the value of the name attribute of the saxon:collation may be used as a collation URI. There is no constraint on the form this URI takes, indeed there is no requirement that it be a legal URI. See saxon:collation for more details.

A collation URI may also be constructed directly. This enables collation URIs to be used in XPath and XQuery applications as well as in XSLT stylesheets. Such a collation URI takes the form http://saxon.sf.net/collation?keyword=value;keyword=value;.... The query parameters in the URI can be separated either by ampersands or semicolons, but semicolons are usually more convenient. The keywords available are as follows:

keyword	values	effect
class	fully-qualified Java class name of a class that implements `java.util.Comparator`.	This parameter should not be combined with any other parameter. An instance of the requested class is created, and is used to perform the comparisons. Note that if the collation is to be used in functions such as contains() and starts-with(), this class must also be a `java.text.RuleBasedCollator`. This approach allows a user-defined collation to be implemented in Java.
lang	any value allowed for xml:lang, for example `en-US` for US English	This is used to find the collation appropriate to a Java locale. The collation may be further tailored using the parameters `strength` and `decomposition`.
strength	primary, secondary, tertiary, or identical	Indicates the differences that are considered significant when comparing two strings. A/B is a primary difference; A/a is a secondary difference; a/á is a tertiary difference (though this varies by language). So if strength=primary then A=a is true; with strength=secondary then A=a is false but a=á is true; with strength=tertiary then a=á is false.
decomposition	none, standard, full	Indicates how the collator handles Unicode composed characters. See the JDK documentation for details.

It is also possible to specify the Unicode Codepoint Collation defined in the W3C specifications, currently http://www.w3.org/2003/05/xpath-functions/collation/codepoint.

In addition, the APIs provided for executing XPath and XQuery expressions allow named collations to be registered by the calling application, as part of the static context.

JAXP 1.2 Conformance

Saxon implements the JAXP 1.2 API (originally known as TrAX), which is now documented as a standard part of JDK 1.4. Saxon implements the interfaces in the javax.xml.transform package in full, including support for SAX, DOM, and Stream input, and SAX, DOM, and Stream output.

Note: The transformation interfaces in JAXP 1.2 are identical to JAXP 1.1: the new version only affects the XML parser interface, adding options to control schema validation.

Saxon also implements part of the javax.xml.parsers API. Saxon no longer provides its own SAX parser, however it does provide a DocumentBuilder. The DOM interfaces are limited by the capabilities of the Saxon DOM, specifically the fact that it is read-only. Nevertheless, the DocumentBuilder may be used to construct a Saxon tree, or to obtain an empty Document node which can be supplied in a DOMResult to hold the result of a transformation.

XSLT Error Recovery Policy

Where the XSLT specification requires that an error be signaled, Saxon produces an error message and terminates stylesheet execution. In the case of errors detected at compile time, it attempts to report as many errors as possible before terminating; in the case of run-time errors, it terminates after the first error.

Where the XSLT specification states that the processor may recover from an error, Saxon takes one of three actions as described in the table below. Either it signals the error and terminates execution, or it recovers silently from the error in the manner permitted by the specification, or it places the action under user control. In the latter case there are three options: report the error and terminate, recover silently, or (the default) recover after writing a warning to the system error output stream. These actions can be modified by supplying a user-defined ErrorListener.

Handling of individual recoverable errors is described in the table below.

This list is incomplete and needs to be reviewed.

Error	Action
There is more than one template rule that matches a node, with the same import precedence and priority	User option
There is more that one xsl:namespace-alias statement for a given prefix, with the same import precedence	Recover silently
An element name defined using xsl:element is invalid	User option
An attribute name defined using xsl:attribute is invalid	User option
There are several attribute sets with the same import precedence that define the same named attribute	Recover silently
A processing-instruction name defined using xsl:processing-instruction is invalid	User option
A node other than a text node is written to the result tree while instantiating xsl:attribute, xsl:comment, or xsl:processing-instruction	User option
Invalid characters are written to the content of a comment or processing instruction	User option
An attribute node or namespace node is written directly to the root of a result tree fragment, or to any other node that is not an element node.	User option
A value supplied to the value attribute of xsl:number is negative or non-numeric	User option
The document() function identifies a resource that cannot be retrieved	User option
There are several xsl:output elements specifying the same attribute with the same import precedence	Recover silently
disable-output-escaping is used for a text node while instantiating xsl:attribute, xsl:comment, or xsl:processing-instruction	Recover silently
disable-output-escaping is used for a text node within a result tree fragment that is subsequently converted to a string or number	Recover silently
disable-output-escaping is used for a text node containing a character that cannot be output using the target encoding	Recover silently

Michael H. Kay
7 March 2004