Saxon XSLT implements the XSLT 1.0 and XPath 1.0 Recommendations from the World Wide Web Consortium, found at http://www.w3.org/TR/xslt and http://www.w3.org/TR/xpath, which are referred to here collectively as "XSLT"
Saxon is 100% conformant to the mandatory requirements of these recommendations, except in some cases where incompatibilities have been introduced in the XSLT 2.0 and XPath 2.0 working drafts.
This release of Saxon implements many facilities defined in the draft XSLT 2.0 specification. This specification is not yet complete, and the facilities I have chosen to implement are those that appear to be the most stable.
The XSLT 2.0 features supported include the following:
Saxon will automatically treat a result tree fragment as a node-set when required.
Saxon supports the xsl:result-document
element defined
in the draft XSLT 2.0 specification, together with named xsl:output
declarations.
Saxon supports the use of xml:base to define the Base URI of a node, as defined in
the XSLT 1.1 specifications, except in the case of the base URI of a processing instruction contained
in an external entity. (This feature is supported regardless of the setting of
[xsl:]version
)
The separator
attribute on xsl:value-of
and xsl:copy-of
is implemented.
The xsl:for-each-group
instruction is implemented, together with the
current-group()
function.
Named sort keys (using the xsl:sort-key
declaration) are implemented, together
with the sort()
function that uses them.
The collation
attribute in xsl:sort
is implemented, supported
by a new saxon:collation
declaration to manage collation names.
The xsl:function
declaration is implemented.
Instructions such as xsl:for-each
and xsl:copy-of
can handle arbitrary sequences.
The method="xhtml"
attribute on the xsl:output
declaration
is available (though not fully conformant with the current Working Draft in the output
it produces).
The xsl:analyze-string
instruction is available for regular expression
processing.
Variables and parameters may take an optional as
attribute, defining the data
type of the variable.
A template rule can apply to several modes.
The value of an enclosed expression in an attribute value template may be a sequence; the items in the sequence are output using space as a separator (this is not backwards compatible with XSLT 1.0
The disable-output-escaping
attribute of xsl:attribute
is implemented
The unparsed-text()
function is implemented.
Known restrictions include the following:
The xsl:schema-import
declaration is not implemented; and therefore,
it is not possible to refer to user-defined types.
There is no support for source documents, result documents, or temporary trees to be validated and annotated using a schema processor.
The override
attribute of xsl:function
is not implemented.
It is not possible to for a local variable to shadow another local variable with the same name.
The new XSLT 2.0 facilities for initiating a transformation are not provided (specifying a named template as the entry point, supplying an input collection, etc.)
Backwards-compatible processing mode is not available.
The required
attribute on xsl:param
is not implemented.
The new xsl:output
option normalize-unicode
is not implemented.
The as
attribute is not implemented on elements xsl:sort
or xsl:for-each-group
.
The as
attribute on xsl:variable
and related elements causes
the supplied value to be cast to the required type. The latest version of the XSLT 2.0 specification
is stricter than this, it only allows atomization.
The xhtml
output method does not follow all the rules for XHTML formatting in
the draft XSLT 2.0 specification.
The top-level expression construct for XPath expressions in XSLT stylesheets is Expr, not ExprSequence. This means you can't write <foo bar="{1,2,3}"/>
The format-number()
function is implemented according to the XSLT 1.0 specification.
The xsl:analyze-string
element does not allow xsl:fallback
as a child
element.
The context is not nullified when evaluating a stylesheet function.
If key()
is called in a match pattern, the argument must be a string literal.
Saxon 7.3 implements
the full XPath 2.0 grammar with the exceptions of the validate
expression and schema-related
aspects of the SequenceType
production. The SequenceType production currently allows either
a node-kind (e.g. element, attribute, document), a built-in simple schema type (e.g. xs:integer, xs:ID), or the
constructs "element of type T" and "attribute of type T" where T is a built-in simple schema type. Note
that the latter require the type annotation of the node to be T, this can be set using the type-annotation
attribute of xsl:element and xsl:attribute.
The restrictions in XPath 2.0 support include the following:
Saxon supports the following data types: string, boolean, decimal, duration, integer, float, double, date, dateTime, time, anyURI, QName, with their literal forms and constructors. All built-in subtypes of integer are implemented (but note, unsignedLong does not support values greater than the maximum value of a Long). All built-in types derived by restriction from xs:string are implemented (but not the list types IDREFS, NMTOKENS, and ENTITIES).
Subtraction of two dates to yield a duration, and addition of a date to a duration, are not yet implemented.
In addition to functions defined in XPath 1.0 and XSLT 1.0, the following functions are available: avg(), base-uri(), codepoints-to-string(), compare(), context-item(), current-date(), current-dateTime(), current-group(), current-time(), date(), dayTimeDuration-from-seconds(), deep-equal(), distinct-nodes(), duration(), empty(), exists(), expanded-QName(), find(), index-of(), insert(), lower-case(), matches(), max(), min(), node-kind(), remove(), replace(), root(), sequence-deep-equal(), sequence-node-equal(), string-join(), string-pad(), string-to-codepoints(), subsequence(), time(), tokenize(), unparsed-text() [both arguments must be supplied], upper-case(), yearMonthDuration-from-months()
The component extraction functions, get-X-from-Y, for types date, dateTime, time, duration, and QName are also available. In the case of durations, they are named get-x-from-duration(), not get-x-from-yearMonthDuration() or get-x-from-dayTimeDuration().
Standard functions are implemented in the null namespace; it is not possible to use the namespace URI defined in the Functions and Operators specification.
Note that in the dateTime and time data types, the timezone is retained as part of the value. Equality and ordering is done by normalizing the time to UTC, but conversion to a string, and extraction of components, reflects the timezone as originally specified.
Constructor functions are available in both the schema-defined namespaces
"http://www.w3.org/2001/XMLSchema" (conventional prefix xs) and "http://www.w3.org/2001/XMLSchema-datatypes"
(conventional prefix xsd). They are available for all the built-in data types that have been implemented.
The semantics are the same as a cast, that is, xs:dateTime($x)
means exactly the same as
cast as xs:dateTime($x)
.
Ordering is implemented as a total order over all durations, based on the average length of a month (one year = 365.242199 days).
In replace(), if the pattern matches a zero-length string, each character in the source text is replaced by two instances of the replacement text. This is the Java behavior.
The rules for conversion of arguments in a function call do not follow the latest XPath 2.0 draft, whether in backwards compatible mode or otherwise. Saxon currently casts the supplied value to the required type.
Operators and functions in general do not handle the empty sequence, when supplied as an operand
or argument, in the way that is defined in the XSLT 2.0 draft specification. For example, numeric
operations on the empty sequence produce NaN
rather than ()
.
There are many areas where the semantics implemented by Saxon still reflect the XPath 1.0 rules, for example the rules for conversion of numbers to strings. In some cases the XPath 2.0 rules in these areas are not yet stable (that is, there are open issues documented in the current working draft).
The precedence of unary minus relative to the union operator is wrong.
Saxon is dependant on the user-selected XML parser to ensure conformance with the XML 1.0 Recommendation and the XML Namespaces Recommendation.
Saxon implements the <?xml-stylesheet?>
processing instruction as described in the
W3C Recommendation Associating StyleSheets with XML Documents.
The href pseudo-attribute must be a URI identifying an XML document containing a stylesheet,
or a URI with a fragment identifier identifying an embedded stylesheet. The fragment must be the value
of an ID attribute declared as such in the DTD.
Saxon works with any SAX2-conformant XML parser that is configured to enable namespace processing. There is one limitation: on the startElement() call from the XMLReader to the ContentHandler, the QName (that is, the third argument) must be present. According to the SAX2 specification, namespace-aware parsers are not obliged to supply this argument. However, all commonly-used parsers appear to do so.
The XSLT specification says that the documentation for an implementation should specify which URI schemes are supported. Saxon supports the URI scheme implemented by the Java java.net.URL class, with the optional addition of a fragment identifier, as described below. Additionally, Saxon allows the user to nominate a URIResolver class which can be used to implement any URI scheme the user wants.
The XSLT specification says that the documentation for an implementation should specify for which media types fragment identifiers are supported. The standard URI resolver supports access to XML documents only. A simple fragment identifier is allowed, consisting of the value of an ID attribute in the document. The effect is to return the subdocument rooted at the element with this identifier if there is one, or an empty document otherwise. For example, the URI mydoc.xml#aaa locates the XML document mydoc.xml, and if it contains an element <eeee id="aaa">, where id is an attribute of type ID, then the document retrieved is an XML document with this <eeee> element as its outermost (document) element.
The values of the vendor-specific system properties are:
xsl:version | 1.8 |
xsl:vendor | SAXON n.n.n from Michael Kay |
xsl:vendor-url | http://saxon.sf.net/ |
xsl:product | SAXON |
xsl:product-version | n.n.n |
All these values are subject to change in future releases. Users wishing to test whether the processor is Saxon are advised to test whether the xsl:product system property has the value "SAXON".
The reason for returning 1.8 from xsl:version
is that the product
is not yet fully conformant with XSLT 2.0
Saxon implements a number of extensions to standard XSLT, following the rules for extension functions and extension elements where appropriate. The extensions are documented in extensions.html. They are all implemented in accordance with the provisions in the standard for extensibility.
The following is the list of encodings recognized by the built-in AElfred parser (case-insensitive):
ISO-8859-1, 8859_1, ISO8859_1 US-ASCII, ASCII UTF-8, UTF8 ISO-10646-UCS-2, UTF-16, UTF-16BE, UTF-16LE
The encodings available on output are the intersection of:
ascii, us-ascii, utf-8, utf8, utf-16, utf16, iso-8859-1, iso-8859-2 koi8-r, cp852, cp1250, windows-1250, cp1251, windows-1251 (again case-insensitive)
with whatever your Java VM supports.
If you select an encoding that the Java VM recognizes, but which is not in the above list, then the output will be written in the requested encoding, but all non-ASCII characters will be written as character references.
Saxon can be used with any SAX-conformant XML parser. The extent of XML conformance depends entirely on the chosen parser.
The default parser is a version of Ælfred. There is one known non-conformance in the version of the Ælfred parser provided with the Saxon product: it does not enforce the constraint that the contents of a general entity must be well-formed. Note, however, that this parser does not perform XML validation.
Saxon accepts input (both source document and stylesheet) from any standards-compliant DOM implementation.
Saxon allows the result tree to be attached to any Document or Element node of an existing DOM. Any DOM implementation can be used, provided it is mutable.
Saxon's internal tree structure (which is visible through the Java API, including the case where Java extensions functions are called from XPath expressions) conforms with the minimal requirements of the DOM level 2 core Java language binding. This DOM interface is read-only, so all attempts to call updating methods throw an appropriate DOM exception. No optional features are implemented. The DOM interfaces to Saxon's tree structure do not reveal namespace nodes as attributes. This means it is not possible to get information about namespace declarations except by calls such as getPrefix() and getNamespaceURI() on Element and Attr nodes).
If an extension function returns a DOM Node or NodeList, this must consist only of Nodes in a tree constructed using Saxon. Since Saxon's trees cannot be updated using DOM methods, this means that the nodes returned must either be nodes from the original source tree, or nodes from a tree constructed using Saxon's proprietary API. It is not possible to construct the tree using DOM methods such as createElement() and createAttribute().
Saxon implements the JAXP 1.2 API (including TrAX), as defined in JSR-63. Saxon implements the interfaces in the javax.xml.transform package in full, including support for SAX, DOM, and Stream input, and SAX, DOM, and Stream output.
Note: The transformation interfaces in JAXP 1.2 are identical to JAXP 1.1: the new version only affects the XML parser interface, adding options to control schema validation.
There are restrictions in using transform() on a DOMSource when the node to be transformed is a node other than the root (i.e. the DOM Document node). These apply only if the supplied DOM is a third-party DOM, not if it is a Saxon-constructed tree. Specifically, if the start node is not the root then it must be an element; and it must not have an ancestor or preceding-sibling node, or an ancestor with a preceding-sibling node, that is an entity reference node or CDATA section node. In addition, the element must be part of a tree that is rooted at a Document node.
Saxon also implements part of the javax.xml.parsers API. Saxon no longer provides its own SAX parser, however it does provide a DocumentBuilder. The DOM interfaces are limited by the capabilities of the Saxon DOM, specifically the fact that it is read-only. Nevertheless, the DocumentBuilder may be used to construct a Saxon tree, or to obtain an empty Document node which can be supplied in a DOMResult to hold the result of a transformation.
Where the XSLT specification requires that an error be signaled, Saxon produces an error message and terminates stylesheet execution. In the case of errors detected at compile time, it attempts to report as many errors as possible before terminating; in the case of run-time errors, it terminates after the first error.
Where the XSLT specification states that the processor may recover from an error, Saxon takes one of three actions as described in the table below. Either it signals the error and terminates execution, or it recovers silently from the error in the manner permitted by the specification, or it places the action under user control. In the latter case there are three options: report the error and terminate, recover silently, or (the default) recover after writing a warning to the system error output stream. These actions can be modified by supplying a user-defined ErrorListener.
Handling of individual recoverable errors is described in the table below.
Error | Action |
There is more than one template rule that matches a node, with the same import precedence and priority | User option |
There is more that one xsl:namespace-alias statement for a given prefix, with the same import precedence | Recover silently |
An element name defined using xsl:element is invalid | User option |
An attribute name defined using xsl:attribute is invalid | User option |
There are several attribute sets with the same import precedence that define the same named attribute | Recover silently |
A processing-instruction name defined using xsl:processing-instruction is invalid | User option |
A node other than a text node is written to the result tree while instantiating xsl:attribute, xsl:comment, or xsl:processing-instruction | User option |
Invalid characters are written to the content of a comment or processing instruction | User option |
An attribute node or namespace node is written directly to the root of a result tree fragment, or to any other node that is not an element node. | User option |
The document() function identifies a resource that cannot be retrieved | User option |
There are several xsl:output elements specifying the same attribute with the same import precedence | Recover silently |
disable-output-escaping is used for a text node while instantiating xsl:attribute, xsl:comment, or xsl:processing-instruction | Recover silently |
disable-output-escaping is used for a text node within a result tree fragment that is subsequently converted to a string or number | Recover silently |
disable-output-escaping is used for a text node containing a character that cannot be output using the target encoding | Recover silently |
Michael H. Kay
12 November 2002