This file describes changes for versions 6.0 through to 6.5.5. Earlier history is logged here:
At the time of writing the most recent release of Saxon is version 8.4. For information about current Saxon versions, see http://saxon.sf.net/.
Version 6.5.5 fixes several bugs:
Under JDK 1.5, if the output filename contains spaces (or other special characters) these are translated
by Saxon 6.5.4 (and earlier releases) to %HH escape sequences. For example specifying -o "my file.out"
on the command line results in the output being written to my%20file.out
. The problem arises because
the JDK 1.5 implementation of the JAXP StreamResult
class escapes special characters in file names,
whereas previous releases did not. The new behavior is correct, but Saxon contained a workaround for the previous
incorrect behavior which is no longer appropriate.
Saxon 6.5.4 and earlier did not allow attribute and namespace nodes to be numbered using <xsl:number level="single">
.
This should be permitted according to the specification, and
if the count
and from
patterns are omitted the result should always be 1 (one).
Saxon 6.5.5 now performs an accurate validation of QNames used in XPath expressions and in various XSLT contexts, for example template names and variable names. Previously the check was an approximation, because some characters were classified differently in Java and in XML.
When running from the command line, if a user-supplied URIResolver is used, and returns null
when resolving the source file or stylesheet named in the command line arguments, then the standard URI resolver
is now invoked to resolve the names.
Whitespace-stripping is now performed (as specified by xsl:strip-space
) when the transformation
is invoked as a JAXP TransformerHandler
.
This version also adds the saxon:require-well-formed
output property from Saxon 8.x. Normally,
if transformation output is sent to a SAX ContentHandler
(for example, because the JAXP result
is a SAXResult
), then it must represent a well-formed
document (that is, there must be a single element node and no text nodes as children of the root). By setting
this <xsl:output saxon:require-well-formed="no">
, you can indicate that your
ContentHandler
is prepared to accept output that is not well-formed.
The TransformerFactoryImpl
(Saxon's implementation of the JAXP
TransformerFactory
class) now supports the method setFeature
introduced in JAXP 1.3. However, the "secure processing" feature which the specification
says all implementations must provide is not supported.
The Controller
(Saxon's implementation of the JAXP Transformer
class) now implements the reset()
method introduced in JAXP 1.3. Note that the
reset()
method does not clear the document pool (the collection of documents loaded
using the document()
function. This is because the purpose of resetting a Transformer
rather than creating a new one is in order to reuse resources.
The Saxon tree models implement DOM interfaces. This support has been upgraded so that
all DOM level 3 methods are present, as required in order to compile the code under JDK 1.5.
In many cases, the new methods are trivial implementations, that is, they typically return null
or throw an UnsupportedOperationException
if called.
In the NodeInfo
interface, the method isSameNode()
has been
renamed isSameNodeInfo()
to avoid conflict with the DOM Level 3 interface of the
same name.
To allow compilation under JDK 1.5, variables named "enum" have been renamed.
The algorithm for generating IDs for attribute and namespace nodes has been changed in both the tiny tree and the standard tree. The previous algorithm did not guarantee that the IDs consisted of ASCII alphanumeric characters, as required by the XSLT 1.0 specification.
In version 6.5.3 the following bug was present, and has now been fixed: When comparing two nodes for identity (e.g. when evaluating the union operator |), an element, text, comment or PI node may be considered identical to an attribute or namespace node if they happen to be at the same offset in their respective data structures. This problem applied to the TinyTree only.
A performance bug in the implementation of result tree fragments has been fixed. The code
in FragmentValue.java
used the construct new Vector(20, 20)
to allocate
space for nodes on the tree; the effect of this is that a fixed allocation unit of 20 items is used, meaning
that the cost of constructing the tree increases as the square of the number of nodes.
Several bugs in xsl:number
have been fixed:
The implementation now correctly handles a format
pattern that contains no formatting tokens, for example format="*"
(the resulting output
takes the form *12*
). {test numb26}
The first non-alphanumeric token in the format picture is no longer treated as a separator
token, so if multiple numbers are output using a format picture of (1)
, the output is now
(1.2.3)
rather than (1(2(3)
. {test numb35}
Saxon now returns the correct results for <xsl:number level="any">
when the context node is an attribute that does not match the count pattern, and
when its parent is an element that does match the count pattern. {test numb32}
An error is now reported if there are two templates in different stylesheet modules with the
same name and the same import precedence, provided (a) that there is no template with that name
and higher import precedence, and (b) that the template is actually referenced in an xsl:call-template
instruction. {test error052}
An error is now reported if a namespace prefix used in the exclude-result-prefixes
attribute
of the xsl:stylesheet
element of an imported or included stylesheet module has not been declared,
unless the module is in forwards-compatible mode (for example, because it specifies version="2.0"
.
{test error235}
Within xsl:for-each
, any xsl:sort
elements must now precede any other instructions.
{test error172}
Some previously unreported errors have been found as a result of running Saxon 6.5.4 against the test suite for Saxon 8.5. The following bugs have been fixed:
Saxon 6.5.3 did not fail cleanly when there are too many nested xsl:apply-templates
calls.
{test error051}
Saxon 6.5.3 threw a NullPointerException if an xsl:stylesheet
or xsl:transform
element appeared as a child of another element in the stylesheet. {test error236}
Saxon 6.5.3 failed when processing a literal result element appearing within an xsl:fallback
instruction.
{test ver23}
Saxon 6.5.3 produced an ArrayIndexOutOfBoundsException when serializing an element that (a)
is included in the cdata-section-elements
attribute of xsl:output
, and (b)
contains a Unicode character above 65535 (a "supplementary character"). {test output176}
Saxon 6.5.3 threw an ArrayIndexOutOfBounds exception when calling an extension function if the target class contained both a matching static method with N arguments and a matching non-static method with N-1 arguments.
Saxon 6.5.3, when operating in forwards compatibility mode, does not ignore all 1.0 errors (for example,
XSLT 2.0 features) appearing within the children of an unrecognized top-level element such as the XSLT 2.0
xsl:function
element.
Saxon 6.5.3, when operating in forwards compatibility mode, does not always treat an optional
attribute with an unrecognized value as if the attribute were not specified. This is a difficult problem to fix
across the board, but it has been fixed for some particular cases that arise when Saxon 6.5.4 is presented
with an XSLT 2.0 stylesheet, for example if the mode
attribute of xsl:template
or xsl:apply-templates
is not a valid QName then the attribute is ignored {test cnfr18}. A match pattern
that XSLT 1.0 does not allow is now treated (when in forwards compatibility mode) as a pattern that no
nodes will match {test ver24} (this is not an explicit rule in the XSLT 1.0 specification, but seems to be the best thing
to do given the intent behind forwards compatibility mode.)
In Saxon 6.5.3, the JDOM adapter (when used with the JDOM 1.0 release) did not correctly handle the namespace axis, nor the DocType node in a JDOM tree. It also did not allow access to attributes in the XML namespace (such as xml:space)
Support for FOP has been dropped.
Saxon 6.5.4 is no longer supported under JDK 1.1. In consequence, Instant Saxon, which relied on the Microsoft JVM, is no longer available.
The applet support module XSLTProcessorApplet, and the sample HTML pages illustrating use of Saxon as an applet, have been dropped.
Added the EXSLT functions in package math: abs, acos, asin, atan, atan2, constant, cos, exp, log, power, random, sin, sqrt, tan. Thanks to Simon St. Laurent for these.
The mechanism for keeping stylesheet signatures in the namepool has been removed. It caused a creeping "memory leak" in continuously running services, and is not really needed. It was invented to allow namepools to be copied, but this facility has never been properly documented or tested. Instead, there is now a simple check that the source document and stylesheet are using the same namepool.
Added extension functions saxon:pause-tracing() and saxon:resume-tracing()
Suppress lazy evaluation of assignable variables. (This was designed to prevent a stack overflow, it didn't succeed, but it seems a good idea anyway).
This is primarily a maintenance release for error clearance, however, it introduces an important incompatible change for XSLT conformance reasons.
To use XSLT version 1.1 features, the stylesheet should now specify version="1.1". The W3C XSL Working Group has announced that XSLT 1.1 will not be progressed beyond the working draft stage. Therefore, in the interests of standards conformance and stylesheet portability, Saxon 6.5 allows version 1.1 features to be used only if the stylesheet specifically indicates that it intends to use them. Specifically, this affects three features:
xsl:document
element (but not saxon:output
, which is a synonym)xsl:script
element (but not saxon:script
, which is a synonym)To use these features, the stylesheet must invoke forwards-compatible mode.
This is done by specifying any value other than "1.0" as the value of the version
attribute of the xsl:stylesheet
element, or as the value of the xsl:version
attribute on a containing literal result element. For clarity, I recommend setting
version="1.1"
.
Note that for the third feature, the ability to use a result tree fragment as a node-set,
it is the xsl:variable
element that creates the result tree fragment that
must be executed in forwards-compatible mode, not the expression where it is used.
When the stylesheet specifies version="1.0"
, the saxon:node-set() or
exslt:node-set() function can be used to convert a result tree fragment to a node-set.
A new switch has been added to disable use of extension functions, other than
the system-supplied Saxon and EXSLT extension functions. The switch can be set from
the command line (-noext
option) or by calling:
setAttribute(FeatureKeys.ALLOW_EXTERNAL_FUNCTIONS, new Boolean(false))
on the TransformerFactory object. This switch is useful when running Saxon as part of a servlet environment, if the stylesheets to be executed are untrusted. It prevents the security risk of an untrusted stylesheet invoking arbitrary Java methods on the server machine that use, modify, or delete privileged system resources.
The reference implementation of JAXP 1.1 available from Sun has not changed for some time, and has one or two annoying bugs. A much more recent version, in which these bugs are cleared, is available from Apache as part of the Xalan product. I have therefore updated the JAXP library in Saxon using the implementation included with Xalan-j_2_2_D11. An additional benefit is that the Apache distribution includes source code, which is not available from Sun. I have made no changes to this source code (not even to change the default parser and XSLT processor).
saxon.jar
file is now built with a manifest that identifies
its main class, so you can invoke Saxon with the command line:java -jar saxon.jar source.xml style.xsl
This is primarily a maintenance release for error clearance.
Defects fixed (details at SourceForge):
Preview mode (the <saxon:preview> extension) was never designed to work when using a TransformerHandler in the JAXP interface, only when using the transform() method. This is now documented as a restriction.
FOP integration has been upgraded to work with FOP 0.20.1. It no longer works with previous FOP releases. However, if you need to use Saxon 6.4.4 with FOP 0.19.0, you should be able to do so by using the saxon-fop.jar issued with 6.4.3, along with the 6.4.4 version of saxon.jar.
JDOM integration has been upgraded to work with JDOM 0.7. It no longer works with previous JDOM releases. However, if you need to use Saxon 6.4.4 with JDOM 0.6, you should be able to do so by using the saxon-jdom.jar issued with 6.4.3, along with the 6.4.4 version of saxon.jar. Saxon's JDOM integration now relies on the JDOM tree being built with entities expanded (which is the default). Saxon does not yet support the new JDOMSource and JDOMResult classes included in JDOM 0.7.
Writing an attribute node when there is no open element start tag is now a recoverable error: the action depends on the recovery policy (by default, a warning is output and processing continues).
Writing a text node with disable-output-escaping when it includes a character outside the character set supported by the output encoding was previously treated as an unrecoverable error; this has changed so the processor recovers silently (by not disabling output escaping).
This means that all errors for which the XSLT specification describes a recovery action are now either (a) handled according to the recovery policy selected by the user, or (b) recovered silently, using the recovery action described in the XSLT Recommendation. None of these errors are signalled unconditionally.
The DTDGenerator, previously issued as a sample Saxon application, has been rewritten as a pure SAX application. Since it no longer makes use of Saxon, but works with any JAXP 1.1 compliant XML parser, it is now issued as a free-standing package under the Saxon project at SourceForge, with its own version number (initially 7.0).
The handling of xsl:message terminate="yes" has changed. Previously the
transform() method output a message to System.err, and then returned normally. Now
the transform() method outputs no message (other than the message output by xsl:message
itself) but throws a com.icl.saxon.style.TerminationException
,
which is a subclass of jaxp.xml.transform.TransformerException
.
The command-line driver com.icl.saxon.Stylesheet now handles this exception, so the
only change in the behavior of the command line is that a result code
(1) is now returned. When using the Java API, the application can now detect the condition
by catching the TerminationException.
There are a couple of new optimisations, prompted by an example submitted by Evan Lenz:
A new extension function has been added for parsing the content of processing instructions: getPseudoAttribute().
This is a maintenance release for error clearance.
The following defects are cleared in this release. Full details of all Saxon bugs are now placed on the SourceForge register as soon as the problem is acknowledged to be a bug.
The exslt:function element now implicitly declares the namespace
http://exslt.org/functions
to be an extension namespace, so that
http://icl.com/saxon
to be an extension namespace.
For details of defects cleared in this release, see the project page at SourceForge (http://www.sourceforge.net/projects/saxon).
A failure to retrieve the identified URI, or a failure to parse the XML, is now treated as a recoverable error. By default, the error is reported to System.err, and an empty node-set is returned. As with other recoverable errors, you can change this by setting the recovery policy (e.g. -w2 on the command line makes the error fatal), or by supplying your own ErrorListener.
It is now possible for a user-defined URI Resolver to return a DocumentInfo object directly. This is illustrated by the sample URI Resolver included in the TraxExamples sample application, which is used to resolve URIs referring to simple text files. (This applies only to the document() function; when used by xsl:include or xsl:import, the URIResolver should return a SAXSource, StreamSource, or DOMSource).
XSLT 1.0 says that is an error to output a node other than a text node while evaluating xsl:comment, xsl:attribute, or xsl:processing-instruction; the processor is allowed to recover by ignoring the offending nodes and their descendants. Saxon previously recovered silently from this error. At this release, this error is handled in the same way as other recoverable errors: by default, it results in a warning message.
XML parsing failures detected in the source document or stylesheet are now directed to the JAXP ErrorListener, unless a SAX errorHandler has been nominated.
A PrintStream can now be supplied to the standard error listener to define the destination for error messages (the default is System.err).
There has been some tidying-up of error messages, to prevent the same message being output several times.
The core extension functions defined in the EXSLT dates-and-times module are now available: specifically date-time(), date(), time(), year(), leap-year(), month-in-year(), month-name(), month-abbreviation(), week-number(), week-in-month(), day-in-year(), day-in-month(), day-of-week-in-month(), day-in-week(), day-abbreviation(), hour-in-day(), minute-in-hour(), second-in-minute(). For details see http://www.exslt.org/
This was an emergency patch release of Instant Saxon only. It fixed a problem whereby Instant Saxon would not run unless certain other JAXP software (for example, the Crimson parser) was present on the classpath.
I have given the API Guide a much needed overhaul, including improved descriptions of the APIs for invoking XPath expressions from Java code.
The following errors were found version 6.3, and have been cleared.
6.3/001 | If the match pattern in an xsl:key definition matches both element and attribute nodes, only the attributes will actually be indexed. (Present in all previous releases.) |
6.3/002 | If a Writer is supplied to receive the output of a transformation (when using the JAXP 1.1 API), Saxon has no control over the output encoding. It is therefore possible that the value of the encoding attribute written to the XML declaration will bear no relation to the actual encoding of the output file. As a partial fix to this, Saxon now determines the encoding used by the Writer if it can (namely, if it is an OutputStreamWriter) and writes this encoding name to the XML declaration, ignoring any encoding that was requested in xsl:output or via the setOutputProperty() method. On my Windows configuration, this will generally result in the XML declaration saying encoding="Cp1252". The recommended circumvention to this problem is to supply an OutputStream for the output, rather than a Writer. (Present in all previous releases.) |
6.3/003 | It is not possible to call an external function that expects an argument of class com.icl.saxon.expr.FragmentValue (or any other subclass of NodeSetValue), even if the supplied argument is the correct class. [Fixed but not tested]. (Present in 6.3 only) |
6.3/004 | The following axis, starting at an attribute or namespace node, should include the descendants of the element that is the parent of the attribute or namespace node. It currently returns only the nodes that are on the following axis from the parent node. (Present in all previous releases.) |
6.3/005 | A NullPointerException occurs if a StreamSource is supplied without calling setSystemId(). (Present in 6.3 only) |
6.3/006 | A bug in the AElfred XML parser: if the DTD declares an element type as having element content, but an element of that type wrongly contains non-whitespace text, then AElfred simply ignores the offending text; it reports no error, and it doesn't report the text to the application. As a non-validating parser, AElfred should report the text content exactly as if the DTD declared the element as having mixed content. (Present in all previous releases.) |
6.3/007 | AElfred fails to detect and report a well-formedness error: specifically, when the source text contains the disallowed sequence ']]>' immediately after an entity reference such as '<'. (Present in all previous releases.) |
6.3/008 | A keyword used as an operator (div, mod, and, or) cannot be used as a variable name within an XPath expression. (Present in all previous releases.) |
6.3/009 | When a Saxon tree is supplied as input to a transformation (as a DOMSource), and needs to be rebuilt in order to strip whitespace nodes, and when the target format is a standard tree rather than a tinytree, then a NullPointerException may occur when reading the children of the root node (after processing the children that exist). (Present in Saxon 6.3 only.) |
6.3/010 | The Ælfred parser, after reading an external entity, does not close the input file. It has been reported that on the Microsoft platform this can result in the operating system keeping the file locked indefinitely, preventing other processes updating it. The fix for this problem closes the input stream or reader even if this was supplied by a user-supplied entity resolver. (Present in all previous releases.) |
6.3/011 | The TemplatesHandler (which allows a stylesheet to be built using SAX events) does not work. (Present since JAXP support was introduced.) |
6.3/012 | The Ælfred XML parser, when invoked with http://xml.org/sax/features/namespace-prefixes set to true, does not report namespace declarations to the application as attributes on the startElement() call. This doesn't affect Saxon, because Saxon always sets this feature to false, but it may affect other applications using Ælfred. (Present since Saxon 6.3: a side effect of the fix for bug 6.2.2/011.) |
6.3/013 | Namespace aliasing (xsl:namespace-alias) on attribute names does not work. The new attribute name that is generated will have the local part of the attribute name overwritten with the local part of the containing element name. (Present in all previous releases.) |
6.3/014 | When calling an extension function that expects an argument declared as being of type java.lang.Object, a supplied string, number, or boolean is passed as an instance of an internal Saxon class, rather than being converted to a String, Double, or Boolean. |
Saxon no longer sets itself as the default DocumentBuilderFactory for use when building a DOM. This is because the Saxon DOM implementation, being read-only, is suitable only for specialized use.
Saxon still sets itself as both the default XSLT transformer and the default SAX2 ParserFactory.
Saxon's FOP integration has been updated to use FOP 0.19.0
Two new attributes are available on the xsl:output and xsl:document
elements, for use when method="saxon:fop"
:
fop:renderer="org.apache.fop.render.pdf.PDFRenderer"
.fop:configuration="c:\config\fop.xml"
Here fop: is the prefix of a namespace whose URI must be
http://icl.com/saxon/fop
These two attributes have not been fully tested.
These changes are made partly to improve maintainability of the code, partly to reduce its size, and partly to enable the future support of a wider variety of data structures that the XPath implementation can access (for example, non-SAXON DOM structures, databases, etc). Some of the changes will affect Java applications, especially those that make intimate use of internal Saxon implementation classes.
The main change is a major simplification of the NodeInfo interface, greatly reducing the number of methods and subclasses that need to be implemented to support a new kind of tree structure, but hopefully without reducing the usability of the interface or the performance of its implementations.
The interface classes that are subclasses of NodeInfo have been eliminated,
(for example the old favorite ElementInfo).
The only exception is DocumentInfo (representing the root node or the document as a whole).
This reflects the fact that in the XPath data
model, all methods are available on any kind of node.
Tests that were previously
written if (node instanceof TextInfo)
should now be written
if (node.getNodeType()==NodeInfo.TEXT)
. In other cases,
simply replace the specific interface (for example ElementInfo) by the general
class NodeInfo.
The NodeInfo interface, which is the main interface to Saxon's tree model, no longer extends the DOM Node interface. This means that methods such as getNextSibling() are no longer available on this interface. Navigation from a node should be done instead by creating an enumeration using one of the XPath axes, using the getEnumeration() method.
However, the two implementations of the NodeInfo interface, that is the standard tree and the tiny tree, continue to implement the DOM Node interface. To achieve this, the two implementation types (NodeImpl and TinyNodeImpl) both inherit from a new abstract class called AbstractNode. This class implements both the Saxon NodeInfo interface and the DOM Node interface; it also includes methods needed only for element, text, comment, or root nodes. (This is done to make these methods shared between the two tree implementations: it is not possible in Java for a class such as TextImpl to inherit both from NodeImpl containing the Saxon methods and from a generic AbstractTextImpl containing the DOM methods.)
The NodeInfo class now implements the JAXP Source interface, which means that any NodeInfo can be used directly to define the source of a transformation, with no need to wrap it in a DOMSource object. Note that if you supply the source tree in this way, it is your own responsibility to strip any unwanted whitespace nodes before XSLT processing begins. The xsl:strip-space and xsl:preserve-space instructions in the stylesheet will be ignored.
Saxon still uses DOM methods such as getNextSibling() to navigate the stylesheet tree, which is always implemented using the standard tree model. However, Saxon no longer relies on the source document providing DOM interfaces.
As well as the DOM methods, a number of other methods on the NodeInfo interface have been removed. Many of these were "shortcut" methods that weren't really needed, and which were the same in all implementations. In all cases there are alternatives available.
The getValue() method in the NodeInfo class has been renamed getStringValue(), to better reflect its meaning, and to avoid clashing with the getValue() method of the org.w3c.dom.Attr class.
The AxisEnumeration classes are now logically part of the tree implementation, so they are implemented differently for each tree structure. This allows the implementation to use the navigation mechanisms that are most efficient in each data structure.
The subclasses of Axis, which existed essentially to provide information about each axis, have been removed. Instead the Axis class itself provides this information in the form of a number of arrays, indexed by axis number. The Axis class has been moved to the com.icl.saxon.om package.
The unused utility methods in class com.icl.saxon.om.Navigator, for example isFirstInGroup() and getAncestor(), have been deleted. If you need these methods in your application, I suggest reconstructing them within your application code, based on the Saxon 6.3 source code.
The interface com.icl.saxon.om.ExtendedAttributes has been removed from the object model, as the preferred way of accessing all the attributes of an element is now to enumerate the attribute axis.
In previous releases, certain information held within a document was required to be unique across all documents used within a single transformation: in particular, the document number, and the node sequence numbers. This potentially causes problems when the same source document is used in multiple transformations, perhaps running in parallel. The problems were previously avoided by rebuilding the document for each transformation, which is inefficient.
In Saxon 6.4, a document no longer contains a unique document number. The methods generateId() and getSequenceNumber() now generate numbers which are required to be unique only within a single document; making them globally unique is done by the calling code, with the aid of the document pool maintained by the Controller.
A tree implementation is no longer required to provide sequence numbers for the nodes. Instead, it is required to implement a compareOrder() method that determines the relative ordering of two nodes within the same tree. Comparison of nodes in separate trees is now done at a different level of the software.
The only extra data that a source document now contains to support Saxon transformations is:
NodeHandler was previously an abstract class in package com.icl.saxon.handlers; it is now an interface in package com.icl.saxon. This may affect user-written node handlers, used either in a Java (non-XSLT) application, or via the saxon:handler extension element.
There is an extra method requiresStackFrame() whose value is a boolean. You can generally return false. Return true only if the node handler maintains variables or parameters that can be accessed from XPath expressions - something that is not especially easy to do.
This change also means that any user-written TraceListener will need to be recompiled.
I have made internal changes to the sorting routines to reduce the memory used, especially when the sort involves only a single sort key. The changes are unlikely to affect many users. However, some of the methods in internal classes such as SortedSelection have changed.
To illustrate the way that the new NodeInfo interface can be used to create adapters for other document formats, I have built an adapter for JDOM (see http://www.jdom.org/). Although the code for this is included in the main source tree, it is issued in a separate Jar file, saxon-jdom.jar. The code is still at beta quality. A sample application showing how to use Saxon with JDOM is provided. The JDOM interface requires JDK 1.2.
This facility allows a JDOM tree to be used as the input to an XSLT transformation, or as the target for XPath expressions issued from your Java code. You can direct the output to a JDOM tree by using JDOM's SAX driver as the SAXResult destination object for the transformation.
Saxon currently makes no attempt to merge adjacent text nodes in the JDOM tree: these can arise if the two text nodes are separated by an entity boundary or by a CDATA section boundary.
Using SAXON with JDOM is not likely to be especially efficient; it requires extra memory for the wrapper data structures, and some XPath navigation routes are quite inefficent because they are not supported directly in JDOM (for example, JDOM provides no direct way of getting from a node to its siblings). It is provided partly as an illustration of how to interface other data sources, and partly for users who already have data in JDOM format. It is particularly useful to enable XPath access to JDOM from Java applications.
I have reviewed the changes made by David Brownell in his version of the Ælfred XML parser (available as project xmlconf in www.sourceforge.net), and have incorporated those that are relevant into the Saxon version. This is basically all changes except those required to report validation errors. Most of the changes are very minor, but there are some enhancements in the handling of character encoding: if an input file is in an encoding that Ælfred itself does not understand, it now attempts to get the Java VM to decode it. The set of character encodings available in the Java VM is platform-dependent.
Following a suggestion from René Jansen, I have changed the xsl:insert
code so it now prepares the SQL statement only the first time it is executed, and reuses
the prepared statement thereafter. Also, it can now handle columns that are not strings.
The three instructions xsl:attribute, xsl:comment, and xsl:processing-instruction have been speeded up. Where the content of the instruction is a single text node, or an xsl:value-of instruction, Saxon now avoids the overead of setting up a new output destination; instead of processing the content as a general template body, it evaluates it directly as an expression. Where this is not the case, a streamlined output method is used that avoids many of the overheads previously incurred.
Global variables and parameters are no longer evaluated if there is another variable or parameter with the same name and higher import precedence.
An xsl:variable element containing a single text node is now treated specially, bringing the performance close to that of a String variable.
Saxon now implements the javax.xml.parsers package in JAXP 1.1 as well as the javax.xml.transform package.
If you have the system property javax.xml.parsers.SAXParserFactory set to the value com.icl.saxon.aelfred.SAXParserFactoryImpl, then any call on JAXP 1.1 interfaces to get an XMLReader will select AElfred. Moreover, Saxon itself uses the JAXP 1.1 interfaces to get an XMLReader if none has been explicitly requested, so you can now determine the parser to be used by setting this system property. The default for this property, defined by a services file in saxon.jar, selects the AElfred parser.
Similarly, if you have the system property javax.xml.parsers.DocumentBuilderFactory set to the value com.icl.saxon.om.DocumentBuilderFactoryImpl, then any call on JAXP 1.1 interfaces to get a DOM Document builder will select the Saxon tinytree implementation. However, Saxon does not call JAXP interfaces to get a Document builder: it will always choose its own. Note that Saxon's DOM implementation is an immutable DOM: you can construct the DOM by parsing a source document, but you cannot build it or modify it through the DOM API methods.
Saxon's Builder and Stripper classes have been moved to the package com.icl.saxon.om.
When a Saxon document is supplied as input to the transform() method (using a DOMSource object), in previous releases the tree was rebuilt. At this release the tree is used as is, provided that either (a) the stylesheet does not require whitespace nodes to be stripped, or (b) whitespace stripping has been disabled by calling the new Controller.disableWhitespaceStripping() method. In the cases where the tree does need to be rebuilt, a "fast path" routine has been introduced to do this: previously the same code was used as for a third-party DOM, which incurred unnecessary costs because there are so many different ways namespaces can be represented in a DOM.
When performing multiple transformations on a single source document, it is best to do the whitespace stripping once as a separate operation: this is made possible by a new method PreparedStyleSheet.stripWhitespace(), which uses the xsl:strip-space directives in a stylesheet to remove whitespace from a document (in fact, it returns a new document that is a copy of the original, with relevant whitespace nodes removed; if no whitespace stripping needs to be done, it returns the original document unchanged).
It is now possible to supply a Saxon document as the output of a transformation. (This didn't work at previous releases, though the restriction was undocumented.) The document must be empty, and the node supplied in the DOMResult object must be the document (ie. root) node.
It is now possible to start a transformation at a node other than the root node, if the input is supplied in the form of a DOM (in a DOMSource object). Global variables are still evaluated with the root node as context node, and the entire tree is available to the transformation, but the first template rule applied is not, as is usual, the match="/" rule, but the rule that matches the supplied node. The DOM supplied as input must not contain CDATA or entity reference nodes that are parents or preceding siblings of the start node.
Saxon's support for Java extension functions has been brought into line with the working draft XSLT 1.1 specification.
Polymorphic methods are now fully supported. If the relevant class has several methods (or constructors) with the same name, the one that is chosen is the one that gives a "best match" to the types of the supplied arguments, following the rules in the XSLT 1.1 draft. If there is no unique method that provides a best match according to these rules, an error is reported.
Methods that return void, null, char, or byte are now handled as described in the XSLT 1.1 working draft.
There is still a restriction that extension functions cannot construct a new DOM tree and return nodes from this tree by using DOM methods. They can only return existing nodes that were constructed by Saxon itself.
Methods that expect a node-set as input can now declare the argument type as com.icl.saxon.expr.NodeEnumeration, as an alternative to com.icl.saxon.expr.NodeSetValue. This is likely to be a bit more efficient. The enumeration will always be positioned at the start when the function is called, and its position on exit can be anywhere. It is also possible to return a NodeEnumeration as the result of a function. Again, the enumeration must be positioned at the start. Returning a NodeEnumeration is especially efficient if the result is then converted to a String or a Boolean.
The rules for spelling of external function names have been brought into line with XSLT 1.1. This may require stylesheet changes. For example, the function has-same-nodes() must now be spelt as "has-same-nodes()" or "hasSameNodes()", it can no longer be spelt as "hassamenodes()" or as "HAS-SAME-NODES()". For backwards compatibility, the node-set() function may be spelt with or without the hyphen (or as "nodeSet()").
EXSLT is an initiative to define a standardized set of extension functions and extension elements that can be used across different XSLT processors.
Saxon now supports the EXSLT modules Common, Math, Sets, an Functions. The full list of extension functions is:
plus the following new elements:
These have considerable overlap with functions that have previously been provided in the Saxon namespace. The Saxon versions of the functions remain available, for the time being, but the EXSLT versions are preferred.
The saxon:function and saxon:return elements have been changed slightly to conform to the EXSLT rules. Specifically: saxon:return can now appear inside xsl:for-each, provided the xsl:for-each iterates at most once. There is now a check that saxon:return is not used inside the definition of a variable or inside another saxon:return. It is an error to instantiate more that one saxon:return within a function.
Following a suggestion from Christian Nentwich, I have implemented a new extension function saxon:closure(), which forms a node-set by taking the transitive closure of a node-set expression. The function does NOT detect cycles.
The following errors were found version 6.2.2, and have been cleared. Many of these relate to incorrect handling of error cases, and reflect the fact that I have greatly increased the test coverage of error handling.
6.2.2/001 | If the first argument of the key() function is not the name of a key defined in the stylesheet, a diagnostic dump is produced in place of a meaningful error message. |
6.2.2/002 | If the name attribute of the xsl:call-template instruction is not the name of a template defined in the stylesheet, a diagnostic dump is produced in place of a meaningful error message. |
6.2.2/003 | No error is reported when the use-attribute-sets attribute of xsl:attribute contains a circular reference. (Instead, the stack overflows). Note: the fix for this only detects the error at run-time if the attribute-set is actually used. Technically, the error should be detected at compile time, and reported even if the attribute set is never used. |
6.2.2/004 | No error is reported when the xsl:include or xsl:import element is non-empty. |
6.2.2/005 | A null pointer exception occurs if the href attribute of xsl:import or xsl:include is omitted. |
6.2.2/006 | No error is reported if a template name, variable name, or mode name does not conform to the lexical rules for a QName. |
6.2.2/007 | No error is reported if the xsl:key element is non-empty. |
6.2.2/008 | No error is reported if the xsl:attribute-set element has content other than xsl:attribute elements. |
6.2.2/009 | An ArrayIndexOutOfBounds exception occurs when attempting to get the children of the last node in the document, if the number of nodes in the document is 4000 times a power of two. Applies to the TinyTree model only. The problem occurred when using preview mode. |
6.2.2/010 | When the AElfred parser attempts to read a file using the http protocol, the encoding specified in the HTTP header should take precedence over the encoding specified in the XML document declaration. However, the parsing of the HTTP header is incorrect, so the encoding is typically identified as "=UTF-8" rather than "UTF-8". This results in an UnsupportedEncodingException. |
6.2.2/011 | There is an error in namespace handling in the AElfred parser. When a "real" attribute precedes a namespace declaration in an element start tag, and the QName of the element or of an attribute is the same as the QName of the parent element or one of its attributes, then the namespace URI assigned to the name may be based on the namespace declarations in force for the parent element rather than those for the child element. |
6.2.2/012 | There is a bug in the current version of JAXP 1.1: when a StreamSource is constructed from a File object, and the filename is of the form "/usr/file.xml", the resulting URL is "file:////usr/file.xml" rather than "file:///usr/file.xml". I have added code to Saxon's TransformerFactoryImpl to circumvent this problem by detecting the incorrect URL and patching it. |
6.2.2/013 | User-written message emitters don't work. |
6.2.2/014 | The integer value returned by getNodeType() on a root node is not consistent with the DOM specifications. Applications that call this method should be recompiled. |
6.2.2/015 | With xsl:output method="html" indent="yes", indentation should be suppressed for output elements that are nested within a <pre> element. It isn't. |
6.2.2/016 | When an invalid property is passed to the Transformer methods setOutputProperty() or setOutputProperties(), an IllegalArgumentException should be thrown. Instead, the value is silently ignored. |
6.2.2/017 | When getOutputProperties() is called on the Transformer interface, subsequent changes to the returned properties should have no effect. This isn't currently the case, as the method returns a reference to the internal property set, rather than making a copy. |
6.2.2/018 | When processing a document containing attributes with undeclared namespace prefixes, Saxon may crash with a NullPointerException after reporting the error. |
6.2.2/019 | On return from a call of xsl:apply-imports, the current template is not reset. This means that a second call on xsl:apply-imports will invoke the wrong template. |
Upgraded to the latest JAXP ("version 1.1 final release") dated 6 Feb 2001. Saxon now uses the JAXP binaries exactly as issued by SUN. Unfortunately the TransformerFactory issued by Sun invokes Xalan as the "platform default" XSLT processor. The saxon.jar file includes a META-INF file to override this, so there should be no problems unless you have other things on the classpath that conflict. If you want to be absolutely sure of loading Saxon rather than any other XSLT processor, set the system property javax.xml.transform.TransformerFactory to the value "com.icl.saxon.TransformerFactoryImpl", either from your application (by calling System.setProperty()), or from the command line (java -Djavax.xml.transform.TransformerFactory=com.icl.saxon.TransformerFactoryImpl classname)
Make sure you remove any older versions of jaxp.jar from your classpath to prevent any incompatibilities.
I have changed the packaging of the FOP integration, to reduce the problems this causes for people who want to rebuild Saxon or load it into a development environment such as IBM's Visual Age for Java. The FOP integration module, FOPEmitter, is now part of a separate package, com.icl.saxon.fop, and is not included in saxon.jar, but is in a separate JAR file, saxon-fop.jar. This must be on the class path if you want to use Saxon with FOP, but you can ignore it otherwise. There are no longer any compile-time references to FOPEmitter from the rest of the Saxon code, so you can recompile the product without first installing FOP, provided that you remove FOPEmitter from the source library first.
I have reinstated the ability to call Java extension functions using the namespace xmlns:ext="full.class.Name" as an alternative to xmlns:ext="java:full.class.Name". However, the "java:" form is preferred.
I added extension functions saxon:before() and saxon:after(), based on the BEFORE and AFTER operators defined in XQuery. These take two node-sets as arguments and return all the nodes in the first node-set that are before/after at least one node in the second node-set, in document order. This provides an alternative to saxon:leading(), e.g. saxon:before(*, s[1]) gets all the child elements that precede the first child <s> element.
A further refinement to class loading: if the loader returned by getContextClassLoader() fails to load a class, we now try to load the class using Class.forName(). This is all something of a black art: different things appear to work in different environments.
I have re-instated saxon:output as a synonym of xsl:document. The reason for this is that some XSLT processors object to finding an xsl:document element in the stylesheet, even when running in forwards compatible mode. Using saxon:output is therefore more portable. Note, however, that the new saxon:output is not completely compatible with the old: attribute names have changed, especially "file" to "href".
The following errors were reported for version 6.2.1, and have been cleared:
6.2.1/001 | An error is reported if, in an XPath expression, one of the symbols "*", "div", "mod", "and", or "or" is used immediately after a comma (that is, as an argument in a function call after the first). The symbol is wrongly interpreted as a binary operator rather than a location path. (Present in all previous releases). |
6.2.1/002 | The expression select="@prefix:*", which should return all attributes in the given namespace, actually returns all attributes regardless of namespace. |
6.2.1/003 | When a user-specified trace listener is specified using the -TL option on the command line, line numbering should automatically be switched on; but the attempt to do so fails. |
6.2.1/004 | If the stylesheet contains more than one xsl:script element, Saxon may attempt to load the wrong Java class. This will usually result in no appropriate method being found. |
6.2.1/005 | In the message reporting an ambiguous template rule match, a pattern that is a union pattern with three or more components is displayed incorrectly as "null". |
6.2.1/006 | A null pointer exception occurs when the name of a system function is misspelt. |
6.2.1/007 | In setting up the SAX2 parser, Saxon fails to state that it requires both the "features/namespaces" and "features/namespace-prefixes" features to be on. A SAX2 XMLReader may therefore fail to supply Saxon with information about namespaces, causing the transformation to produce incorrect results. |
6.2.1/008 | A null pointer exception occurs when the -a option is used and the source document contains no suitable <?xml-stylesheet?> processing instruction. |
6.2.1/009 | The get/set OutputProperties() methods on the Templates and Transformer objects do not work as described in the TrAX interface. On the Templates object, getOutputProperties() returns only those values explicitly set in the stylesheet, not the XSLT-defined defaults. On the Transformer object, getOutputProperties() only returns properties that have been explicitly set using setOutputProperties(). |
6.2.1/010 | In an XPath expression, Saxon reports no error when whitespace is used between a "$" sign and the following variable name. No space is allowed in this position. |
6.2.1/011 | In an XPath expression, Saxon reports no error when a colon is used between a function name or node-type name and the following left parenthesis, if it is separated from the function name by whitespace. For example, no error is reported for "true :()" or "node :()". |
Support for Running Saxon in an Applet: I have shamelessly copied the XSLTProcessorApplet module from Xalan, which was written to run any TrAX processor from a Java applet, and have adapted it to work with Saxon. The only changes were to remove a call on a Xalan error-handling routine, and to change the package name. I have also copied and adapted the Xalan sample application which shows how to incorporate this applet into an HTML page. To run a transformation using Saxon requires saxon.jar to be downloaded to the client. At 550Kb this is fairly substantial.
There are some sample applications using Saxon as an applet in the samples/applet folder.
It is now possible to specify the CharacterSet class to be used for a named output encoding by setting the system property, e.g. -D"encoding.EUC-JP"="EUC_JP"; the value of the property should be the name of a class that implements the PluggableCharacterSet interface.
Saxon has been modified to work with FOP 0_17_0; it no longer works with earlier versions of FOP. This has required some extensions to the Emitter interface, to cater for the fact that FOP now requires an OutputStream rather than a Writer as its output destination. Note also that FOP attempts to load Xerces as its default XML parser; if you want to use Saxon's AElfred parser istead, set the system property -Dorg.xml.sax.parser=com.icl.saxon.aelfred.SAXDriver. To run FOP, include the supplied JAR files fop.jar and w3c.jar on your classpath (FOP uses the DOM SVG package which is not included in saxon.jar).
The following errors were reported for version 6.2, and have been cleared:
6.2/001 | When no implementation of an extension element is available, a compile-time error is reported, whether or not the element is actually instantiated. (Circumvention: add the attribute xsl:version="99" to the extension element). |
6.2/002 | When no implementation of an extension function is available, a compile-time error is reported, whether or not the function is actually instantiated. (Circumvention: add the attribute xsl:version="99" to a literal result element enclosing the call on the offending function). |
6.2/003 | The sample extension for SQL provides no way of closing the database connection. With some configurations, this leads to updates being lost. I have therefore added another extension element, sql:close. |
6.2/004 | If a user-supplied URIResolver is registered with the TransformerFactory, it is not used when resolving the URI contained in the href pseudo-attribute of the xml-stylesheet processing instruction. |
6.2/005 | When the namespace attribute of xsl:element or xsl:attribute evaluates to an empty string, the specification states that the namespace of the resulting element or attribute should be null. Saxon wrongly generates a namespace declaration of the form xmlns:prefix="". |
6.2/006 | Under some circumstances using a local variable in an expression constructed using saxon:evaluate() or saxon:expression() fails, saying the variable has not been declared. The failure only occurs when the xsl:variable element declaring the variable is a sibling of the element containing the attribute containing the call on saxon:evaluate or saxon:expression. You can therefore circumvent the problem by wrapping the relevant element inside <xsl:if test="true()">. |
6.2/007 | If the context node is an attribute node or namespace node, the preceding and following axes (like preceding-sibling and following-sibling) are empty. |
6.2/008 | The output properties set using xsl:output in the stylesheet are not accessible using the getOutputProperty() method of the Transformer. (Circumvention: they are available from the getOutputProperty() of the Templates object). |
6.2/009 | Calling an external function that declares an argument of type org.w3c.dom.NodeList may fail with an exception, if the node-set supplied in the function call has not been fully evaluated (specifically, if it is a NodeSetIntent). |
6.2/010 | Saxon does not report an error when the stylesheet contains two conflicting definitions of the default decimal format; it simply uses the one that comes last. |
6.2/011 | Saxon reports inadequate diagnostics when an XML parsing failure occurs while looking for an xml-stylesheet processing instruction: specifically, if the failure is a "file not found" error that arises while resolving references to external entities or to the document's external DTD. The only output is the message "TrAX Transform Exception". |
The xsl:script element is now available. It is ignored unless the language is "java".
This element can be used to identify the Java class
implementing an extension function as defined in the XSLT 1.1 specification. The
archive attribute can be used to specify a list of URLs to be searched,
but only with a JVM that supports JDK 1.2 interfaces (i.e. not with the Microsoft JVM, and therefore
not with Instant Saxon).
NOTE: the rules
for selecting a method within this class are unchanged. In particular, where there are several
methods with the same name and number of arguments, it is not predictable which will be chosen.
The native Saxon techniques for identifying a Java class will continue to be used if there
is no xsl:script element for the relevant prefix, with one exception: the form
xmlns:prefix="fully.qualified.ClassName"
is no longer supported; use
xmlns:prefix="java:fully.qualified.ClassName"
instead.
The element name saxon:script can be used as a synonym of xsl:script. The advantage of using saxon;script is that other processors will ignore it. This allows you to define the way Saxon will implement an extension function which may be different from the way other processors implement it. This is epecially useful if your stylesheet uses functions such as xx:intersection() which are now offered by several different XSLT processors. Note that the built-in Saxon extension functions are all implemented in the same way as user extension functions, in class com.icl.saxon.functions.Extensions; so you can use src="java:com.icl.saxon.functions.Extensions" to locate the Saxon implementation of these functions.
The Saxon class com.icl.saxon.Context now implements the org.w3c.xsl.XSLTContext interface, as defined in the XSLT 1.1 working draft. This can now be used as the first argument of a method that implements an extension function (but you can continue to use com.icl.saxon.Context if you prefer). A consequence of this change is that getContextNode() and getCurrentNode() now return a org.w3c.dom.Node rather than a com.icl.saxon.om.NodeInfo; if you want to use Saxon methods on the returned node, you will have to cast it to a NodeInfo. Note that although the getOwnerDocument() method of XSLTContext is implemented, the resulting document will not be updateable.
The xsl:apply-imports element may now take parameters, that is, it may have child xsl:with-param elements.
The xml:base attribute is implemented. This can be used to change the base URI of an element (in either the source document or the stylesheet) for the purposes of the document() function. A new extension function is provided (largely for diagnostic purposes): saxon:base-uri() returns the base URI of the context node. Note that the terms "base URI" and "system ID" have in the past been used synonymously. This has been tidied up. The System ID refers to the entity (ie. file) in which an element was found, and is useful for diagnostics in conjunction with the line number. The Base URI defaults to the System ID, but may be changed using xml:base, and is used for resolving relative URIs appearing in calls to document() or to xsl:include and xsl:import.
If you supply your own URIResolver, you can use the base URI any way you like. For example, if the relative URI is the key of a record in a database, you could use the base URI to hold information identifying the database, e.g. the JDBC connection details.
I have changed the algorithm used for generate-id(). The existing algorithm was very inefficient, which was proving a problem with Muenchian grouping algorithms that rely on this function. It performed particularly badly when using the tinytree data structure with a large source document. The new algorithm is much faster, especially with the tinytree structure. It produces different results from the old algorithm, and is different for the two tree implementations.
The following errors were reported for version 6.1, and have been cleared:
6.1/001 | Tail recursion is invoked when it should not be, for example if an xsl:call-template instruction is issued from within a literal result element. Present since Saxon 5.3. |
6.1/002 | A null pointer exception occurs after reporting the absence of the select attribute on the xsl:value-of instruction. The same error occurs in a number of other cases where absent attributes are reported. Present since Saxon 6.1. |
6.1/003 | An ArrayIndexOutOfBounds exception occurs in method outputNamespaceNodes when processing a large source document using the tinytree model. Present since Saxon 6.0. |
6.1/004 | When running in forwards compatibility mode (i.e. when the version attribute on xsl:stylesheet is not 1.0 or 1.1), unknown XSL elements appearing as top-level elements should be ignored. Instead, an error is reported. |
6.1/005 | When the outermost element of the stylesheet does not declare the XSLT namespace (for example, because it declares the Microsoft WD-xsl namespace instead), no specific diagnostics are output, just the message "Transformation failed". Present since Saxon 6.1. |
6.1/006 | The first namespace node for an element (typically the XML namespace) has the same internal identifier as it parent element, which means that when a node-set containing a mixture of element and namespace nodes is constructed, one of these will be wrongly eliminated as a duplicate. The problem applies only to the tinytree model. Present since Saxon 6.0. |
6.1/007 | When loading secondary input documents using the StandardURIResolver, the AElfred parser may be used rather than the one nominated to the TransformerFactory. Present since Saxon 6.1. |
6.1/008 | The logic for using the current directory as the fallback for resolving relative URIs when no other base URI is available fails on UNIX systems where the current directory is returned with a trailing "/". Present since Saxon 6.1. |
6.1/009 | With the TrAX API, when the result of a transformation is a DOMResult, if no user-created DOM was specified using setNode(), the processor is supposed to create the DOM document itself. No attempt is made to do so, instead Saxon fails with a null pointer exception. Present since Saxon 6.1. |
6.1/010 | The XPath expression //abc:xyz returns no nodes. This happens with the tinytree model only, when there is a non-null namespace URI. Present since Saxon 6.0. |
6.1/011 | The call TransformerFactory#getTransformerHandler() (with no arguments), which should return an identity transformer packaged as a SAX ContentHandler, returns an object that is not useable. Present since Saxon 6.1. |
6.1/012 | Errors occur when several Transformers derived from the same Templates object are run concurrently in multiple threads. (The problem is that they share the same Stripper, and this is used to hold information specific to the transformation). |
6.1/013 | The SAX2 driver for the AElfred parser always reports the first two arguments of the endElement() call (the namespace URI and prefix) as empty strings. When the parser is used within Saxon this has surprisingly few ill-effects; the only ones I am aware of are (a) when the an element with a non-null namespace is named in saxon:preview, and (b) when doing an identity transformation using the JAXP 1.1 interface. Present since Saxon 5.3 |
6.1/014 | No error is reported when xsl:copy-of is used as a top-level element. (At 6.1 the instruction is executed "successfully", placing its output at the start of the output file. At previous releases a NullPointerException occurs). |
6.1/015 | When xsl:copy-of is used to copy a result tree fragment, and a top-level element in the result tree fragment uses the default namespace (xmlns=""), but the result tree at that point uses the default namespace with a non-null URI (xmlns="xxx"), then no namespace undeclaration (xmlns="") is written to the result tree, causing the top-level element to be in the wrong namespace. |
6.1/016 | When xsl:document attempts to create not only the output file but the directory it is in, using a Java VM earlier than JDK 1.2 (but not the Microsoft Java VM), it crashes with the message "java.lang.NoSuchMethodError: java.io.File: method createNewFile()Z not found". |
6.1/017 | If a call is made within an XPath predicate to an extension function that uses context information, in particular the saxon:evaluate() extension function, the call may fail with a null pointer exception. |
6.1/018 | Errors in the [xsl:]exclude-result-prefixes and [xsl:]extension-element-prefixes attributes (for example, use of an undeclared namespace prefix) are poorly reported. In some cases the error triggers a null pointer exception, in others it is reported with an unhelpful message, and in some cases it is not reported at all. |
6.1/019 | With output method HTML, if elements are output as children of a script or style element, output escaping is switched on for that part of the script or style text that follows such an element. It should remain off for all the contents of the script or style element. |
6.1/020 | A null pointer exception occurs when reporting an ambiguous template rule match, when one of the matching patterns is a simple node test such as "node()". |
6.1/021 | If an unsupported encoding is requested, Saxon correctly reverts to UTF-8, but the encoding specified in the XML declaration (or the HTML META element) of the output file is the one that was requested, not UTF-8 as actually used. |
The saxon:output element is renamed xsl:document, and its file attribute is renamed href. (At this stage, though, it still takes a filename rather than a URI). The next-in-chain attribute is renamed saxon:next-in-chain and is now available on both xsl:output and xsl:document. The href attribute is mandatory: if saxon:next-in-chain is also present, ot determines the destination of the output of the chained stylesheet. The indent attribute must now be either "yes" or "no"; the previous option to specify the level of indentation is now replaced by saxon:indent-spaces="integer", on both xsl:output and xsl:document. The omit-meta-tag and character-representation attributes, similarly, are prefixed "saxon:" and are available on both elements.
The xsl:output element (like xsl:document) now allows all its attributes to be specified as attribute value templates.
A side-effect of this change is that xsl:output properties are now ignored when running in preview mode, because the properties cannot be evaluated until the source document is available.
The saxon:user-data attribute of saxon:output is removed. Instead, any number of user-defined attributes may be defined on both xsl:output and xsl:document. These attributes must have names in a non-null namespace, which must not be either the XSLT or the Saxon namespace. These attributes are interpreted as attribute value templates. The value of the attribute is inserted into the Properties object made available to the Emitter handling the output; they will be ignored by the standard output methods, but can supply arbitrary information to a user-defined output method. The name of the property will be the expanded name of the attribute in JAXP format, for example "{http://my-namespace/uri}local-name", and the value will be the value as given, after evaluation as an attribute value template.
The special provisions in XSLT 1.1 for defining what happens when you use xsl:document while the current output destination is a temporary tree are not yet implemented.
The standard URI resolver now accepts URIs containing a fragment identifier. The fragment identifier must be the value of an ID attribute within the referenced XML document. The effect is to return a tree containing the subtree rooted at the element with that id. This facility works for URIs contained in the document() function and in xsl:include and xsl:import. If there is no element with the required ID, an empty tree is returned (i.e. a root node with no children).
As a result, embedded stylesheets are now working again. In fact, there is no special code to handle embedded stylesheets: anywhere a stylesheet module can be referenced by URI (including the command line, the xml-stylesheet processing instruction, and the href attribute of xsl:include and xsl:import), a URI containing a fragment identifier can be used, and this will select the relevant subtree in the same way as for any other XML document
In response to complaints about Saxon incompatibility with Xalan, and in order to get the JAXP 1.1 example programs working, I have changed the behaviour of both the AElfred SAX2 driver, and the SAXON standard URI resolver, so that if no systemId is specified for a document, then relative URIs are interpreted relative to the user's current directory. Equally, if the base systemId specified for the document is a relative URI, this is expanded using the current directory as the base. Arguably this behaviour is non-compliant with the SAX2 specification, which states that the systemId must be an absolute URI, but it seems to be a useful convenience.
This means that every document, and every node, now has a base URI: it can never be null. A minor side-effect is that I have withdrawn the ability for saxon:node-set() to take a string (or number, or boolean) as an argument: it must now be a result tree fragment or an existing node-set. The reason is that there is no obvious way of constructing a base URI.
Saxon 6.1 implements the new TrAX interface, now defined as part of JAXP 1.1: see JSR-63. Saxon implements the javax.xml.transform interfaces. Saxon does not implement (or use) the javax.xml.parsers interfaces.
This has involved fairly extensive changes to the Java API for invoking Saxon. Some of the main implications are:
Error handling is now via an ErrorListener object which may be user-supplied. The standard error listener will now report any number of compile-time errors, but will stop at the first run-time error (XSLT recoverable errors are reported as warnings).
The package names for classes such as Templates and Transformer have changed.
The exceptions that are thrown have changed: in general, internal routines now throw javax.xml.transform.TransformerException or one of its subclasses, where previously they threw a SAXException. However, I have removed the exceptions from many internal methods entirely, where they were not needed. In some cases this is achieved by throwing an Error instead, which simplifies exception handling in calling routines.
The Transformer (and hence Saxon's Controller class) is no longer a SAX XMLFilter; instead it is possible to get an XMLFilter that performs the transformation using the SAXTransformerFactory class.
The com.icl.saxon.StyleSheet class no longer acts as the TrAX factory; its only responsibility now is to implement the command line interface on top of the TrAX Java API. It now does this almost entirely using TrAX-defined interfaces rather than Saxon internal interfaces, so the class provides a good demonstration of how the TrAX API can be used.
The classes OutputDetails and OutputFormat have disappeared; output properties are now represented throughout the system using a java.util.Properties object as defined in the TrAX API. Unlike the old OutputDetails class, the Properties object only describes the format of the required output, not its location. The location is represented by a Result object.
The class OutputManager has disappeared: its functions have been merged into the Outputter class. Code that switches output to a new destination should now call Controller.changeOutputDestination() to get a new Outputter, and should remember the previous Outputter so that it can be reinstated when calling resetOutputDestination().
The Emitter class is now an abstract class rather than an interface; this enables null implementations of many of its methods to be provided. It also enables Emitter to implement the TrAX Result interface, which means that wherever the TrAX API allows a Result object to be supplied defining the destination of a transformation, Saxon allows you to supply an Emitter as the destination. (A side-effect of this is that FragmentValue, which implements result tree fragments, is no longer an Emitter, as it cannot be both an Emitter and a SingularNodeSet; instead, it is possible to call FragmentValue.getEmitter() which provides an emitter front-end for building the tree).
If a user-specified ContentHandler is used as the output destination, it will now be notified of requests to disable or re-enable output escaping using special processing instructions inserted into the event stream (the names of these are defined as constants in the class javax.xml.transform.Result). Saxon also now uses a processing instruction to notify the ContentHandler if output has been suppressed because the result tree is not well-formed; if the ContentHandler is prepared to accept ill-formed output, it can reject this notification by throwing a SAXException with the message text "continue".
Setting of configuration parameters (such as supplying a TraceListener or MessageEmitter) is now generally done through the setAttribute() method of the TransformerFactory. The names of the relevant attributes are defined in com.icl.saxon.FeatureKeys. Methods such as setXMLReader() on the Controller and PreparedStyleSheet classes have therefore disappeared.
For the same reasons, the ParserManager.properties file has disappeared.
I have tried to minimize the impact of the TrAX changes on Java-only applications, but inevitably some incompatible changes have crept in. The main ones are:
In future I want to align the Java-only processing model more closely with TrAX, so that the set of processing rules defining the transformation becomes another kind of Templates object.
The following errors were reported for version 6.0.2, and have been cleared:
6.0.2/001 | For the TinyTree tree model, the method getDocumentElement() always returns null. |
6.0.2/002 | A recurrence of 6.0.1/014: the same code was present in three different places, and only one of them was corrected. |
6.0.2/003 | When Saxon input is supplied as a DOM, CDATA section nodes and entity reference nodes are ignored: their contents are simply omitted from the input. |
6.0.2/004 | A null pointer exception is reported if the stylesheet contains a template rule whose match pattern is of the form id('abc'), and the source document contains no node with identifier "abc". |
6.0.2/005 | The method com.icl.saxon.tree.AttributeCollection#getLocalName returns the QName of the attribute, not the local part of the name. This causes the local-name() function when applied to a namespace-qualified attribute node to return the wrong result. |
6.0.2/006 | An attempt to access the last comment in the source document using xsl:value-of, xsl:copy, etc, will fail if the data part of the processing instruction is zero length. The failure occurs with the Microsoft JVM but not with JDK 1.3. Fails with the tinytree model only. (Present since 6.0; see also 6.0.1/008) |
The following errors were reported for version 6.0.1, and have been cleared except where otherwise noted:
6.0.1/001 | When a template is called recursively to obtain a default value for one of its own parameters (i.e. within <xsl:param>), the wrong result may be returned. This is because tail recursion is invoked when it should not be. (Bug also present in 5.5 and earlier releases). | 6.0.1/002 | An array bound exception will occur when processing a document with a stylesheet that uses more than 100 namespace URIs or namespace prefixes. Present since 6.0 | 6.0.1/003 | When a key is defined with match="@*", nothing will be retrieved. The problem also applies to some other patterns that can match attributes, for example match=" name | @name ". (Possibly present in 5.5 and earlier releases - unconfirmed) | 6.0.1/004 | The extension functions saxon:set-user-data() and get-user-data() do not work correctly with the TinyTree model. They may also fail with the standard tree model if the context node is an attribute or namespace. This is because the code relies on a one-to-one mapping of XPath nodes to Java objects. (Present since 6.0) | 6.0.1/005 | Not a bug. | 6.0.1/006 | When attribute value templates are used in the attributes of xsl:sort, for example ascending="{$asc}", then the values used are those that apply the first time the sort occurs; if subsequent sorts have different values for the parameters, these are ignored. This is true even if the subsequent sort takes place in a later transformation using the same PreparedStyleSheet. (Also applies to 5.5 and earlier releases). | 6.0.1/007 | saxon:output and other Saxon extension elements do not allow the xsl:extension-element-prefixes attribute to appear on the extension element itself. (Present since 6.0) | 6.0.1/008 | An attempt to access the last processing instruction in the source document using xsl:value-of, xsl:copy, etc, will fail if the data part of the processing instruction is zero length. The failure occurs with the Microsoft JVM but not with JDK 1.3. (Present since 6.0) | 6.0.1/009 | Running a transformation using the Transformer.getInputContentHandler() method fails saying that the same NamePool must be used for the StyleSheet and the source document. (Present since 6.0) | 6.0.1/010 | The code that searches for an xml-stylesheet processing instruction displays unintended trace information on System.err. | 6.0.1/011 | When xsl:apply-imports is called and there is no explicit imported template rule to invoke, Saxon does a no-op; the correct action is to invoke the built-in template rule for the current node. (Bug present in all previous releases). | 6.0.1/012 | If the value attribute to xsl:number is not an integer, Saxon truncates it towards zero rather than rounding it as specified. (Bug present in all previous releases). | 6.0.1/013 | With the TinyTree model, selecting a namespace node using //e/namespace::n doesn't work. Selecting all namespace nodes using namespace::* is OK. (Present since 6.0) | 6.0.1/014 | An array bound check failure may occur in routine com.icl.saxon.tinytree.TinyElementImpl.makeAttributeNodeFS() when searching for the last attribute node in the document. (Present since 6.0) |
Integration with FOP has been restored. Saxon now works with FOP version 0_15_0.
NamePools: I have changed the approach, so that instead of making a copy of the stylesheet name pool for each transformation, the name pool is now shared (which means its updating methods are now synchronized, to ensure thread-safety). This shouldn't affect most users, unless you are manipulating NamePools explicitly. It is still possible to have multiple name pools, but you now need to organise any copying yourself if this is what you want to do. For 99% of users, it should be possible to ignore NamePools entirely and just leave the system to use the single default name pool all the time.
The following changes are for conformance with the (imminent) XSLT 1.0 errata:
The following errors were reported for version 6.0, and have been cleared except where otherwise noted:
6.0/001 | When xsl:copy-of is used to copy attributes with no namespace prefix, and the owning element has a default namespace declaration (xmlns="xyz"), then an invalid prefix is generated for the attributes. | 6.0/002 | The PreparedStyleSheet object is not serially reusable. A new NamePool needs to be allocated each time it is used. | 6.0/003 | A performance bug: in the match pattern row[id=1234] the predicate is not recognized as a boolean predicate, therefore the pattern matching code determines the position of the row relative to its siblings on the assumption that it needs this information. If there are a large number of <row> siblings this gives a severe performance hit. | 6.0/004 | The function-available() function returns false for a method that exists but that requires one or more arguments. | 6.0/005 | The element-available() function crashes (with a diagnostic print of the name pool contents) if the supplied name is one that is not used in the stylesheet and is not a known XSL or Saxon instruction. | 6.0/006 | With the TinyTree tree model, finding the descendants of a node that has neither descendants nor following-siblings produces incorrect results. | 6.0/007 | DTDGenerator won't compile: no name pool is supplied to RuleManager | 6.0/008 | In the SQL sample application, the last row is not written to database. (This reported bug has not yet been investigated) |
Warning messages (issued typically when a node matches more than one template rule) are now limited in number: only the first 25 are displayed.
In Saxon 5.5, I introduced a change that allows a result-tree-fragment to be implicitly converted to a node-set. I did this in anticipation of changes in XSLT 1.1, and to allow interoperability with MSXML3. However, Microsoft have now withdrawn this facility and conform fully to the XSLT 1.0 rules, so in order to protect Saxon's reputation for 100% conformance, I have decided to withdraw the facility too. It can still be used, however, if the stylesheet specifies version="1.1". For more details, see Conformance
The following errors are cleared in version 6.0:
5.5.1/001 | When xsl:copy-of is used to make a copy of an element node that has no attributes or namespace declarations of its own, the namespace nodes inherited from its ancestor elements are not copied to the result tree. (Present since 5.5) | 5.5.1/002 | In some Java environments (ServletExec) the current method for dynamic loading of classes fails. The fix to this detects this failure and reverts to the simple pre-JDK 1.2 method. | 5.5.1/003 | When <xsl:namespace-alias> is used, Saxon uses the new (result-prefix) prefix and the new URI in the output. A careful reading of the spec suggests that it should use the old (stylesheet-prefix) prefix with the new URI. (The term "result-prefix" is thus a misnomer). | 5.5.1/004 | An ArrayIndexOutOfBounds exception occurs if the match pattern "@comment()" (or "@text()" or "@processing-instruction()") is used in an xsl:template rule. Such a pattern is meaningless (it will never match any nodes) but entirely legal. | 5.5.1/005 | Saxon does not report an error if two sibling <xsl:with-param> elements specify the same parameter name. | 5.5.1/006 | Where conflicting <xsl:strip-space> and <xsl:preserve-space> elements occur in the stylesheet, Saxon gives greater weight to the priority of the pattern than to its import precedence. So <xsl:strip-space elements="ns:item"> in an imported stylesheet will incorrectly override <xsl:preserve-space elements="ns:*"> in the importing stylesheet. | 5.5.1/007 | A null pointer exception can occur in the AElfred parser when attempting to access an XML file using a URL, if the resource accessed by the URL is found but its encoding is unknown. | 5.5.1/008 | A null pointer exception can occur when evaluating a variable reference within the arguments to an extension function that is called within the predicate of a filter expression. | 5.5.1/009 | When running in fowards-compatible mode, Saxon incorrectly rejects XSL elements that contain an attribute other than those defined in XSLT 1.0. | 5.5.1/010 | When xsl:copy is applied to an attribute, text node, comment, or processing instruction, the content of the xsl:copy element should be ignored. It isn't. | 5.5.1/011 | When output to a DOM Node is requested in the TrAX API, this is ignored if an output method is specified in an xsl:output element of the stylesheet. The output is sent to the standard output stream instead. The xsl:output element should be ignored. | 5.5.1/012 | When a top-level element such as xsl:output is used within a template, it is reported as an error. This happens even when processing in forwards-compatible mode (e.g. when version="1.1"). In this case fallback processing (xsl:fallback) should be invoked. | 5.5.1/013 not yet fixed |
When the first argument to the document() function is a result tree fragment, Saxon takes the Base URI (for resolving the URI if it is relative) as if the argument were a string. The intention of the specification, though not clearly stated, is that the Base URI should be calculated as if the argument were a node-set. That is, if the argument is $tree and $tree is defined by <xsl:variable name="tree">doc.xml</xsl:variable>, then the Base URI should be that of the xsl:variable element, not that of the element containing the call on the document() function. |
Added support for two new output encodings on xsl:output: iso-8859-2 and cp1250.
Added two attributes to xsl:output (not yet available in saxon:output):
Added a new extension function saxon:showNodeSet(). It takes a single argument that is a node-set, produces a diagnostic print of the node-set on System.err, and returns an empty string.
Added an extension function saxon:getContext() to get the context object. Only really intended for diagnostic use.
Added an option to choose the tree implementation (see below): -ds for the standard tree, as used in previous releases, -dt for the "tinytree" which is new to this release. The tinytree is the default: it takes up less memory, is faster to build, and generally appears to perform better in most circumstances.
The -a option on the stylesheet, which causes the source document to be processed using the stylesheet identified from its xml-stylesheet processing instruction, now uses the same logic as the getAssociatiedStylesheets() method in the TrAX interface. This means multiple (cascading) stylesheets are now supported. However, embedded stylesheets (identified by href="#id" in the xml-stylesheet processing instruction) are not supported at this release.
There have been a great many internal changes, but relatively few that impact directly on the high-level transformation API. In particular, if you only use TrAX interfaces, there are no changes. Otherwise, the main points to note are:
This release adds support for pluggable character sets: if you specify xsl:output encoding="class-name", class-name should be a class that implements com.icl.saxon.output.PluggableCharacterSet. The class must provide two methods, one that determines whether a given character is present in the character set, and one that gives the name of the encoding to be used by the Java VM for translating Unicode characters into a file with this encoding.
To use free-standing XPath expressions and patterns from a Java application, you now need to supply a StaticContext object when parsing the expression. This object handles the resolution of variable names, namespace prefixes, and function names occurring within the expression. For convenience the StandaloneContext object is provided for this purpose. This class allows namespace prefixes to be declared so they can be used in an expression. It also allows external functions to be called (but not functions defined in your XSLT stylesheet). It does not allow the expression or pattern to contain references to variables.
These details should only affect you if you access intimate internal interfaces or use the Saxon source code.
There are two big changes to the internals of Saxon at this release: a new implementation of the tree structure, and a new system for handling names.
I have introduced an alternative tree implementation (called "tinytree"). This is designed to reduce the number of Java objects created: the tree is sliced vertically rather than horizontally, so instead of having one Java object per node, there is one Java array for each property of the nodes, with an entry in the array for each node. The effect is to greatly reduce the Java memory management overheads. The existing tree structure remains available, and is always used for the stylesheet tree. It is also currently always used for the intermediate result tree created when saxon:output next-in-chain is used.
To select the standard tree structure, use -ds on the command line. To select the "tinytree" structure, use -dt. The default is -dt. You can also select the tree structure using a method on the Controller class.
The tinytree is smaller than the standard tree, as the name suggests, and it is also faster to build. However, it may be slower to navigate. So if you have a small document that is built once in memory and used repeatedly, the standard tree implementation is probably better. In other cases, however, the tinytree usually wins.
I have made radical changes to the way names are managed. Previously, the NamePool object contained a pool of names, but its only real purpose was to avoid the memory overhead of storing each name many times. Now, Saxon takes advantage of the NamePool to avoid storing references to Name objects on the tree at all: instead it stores a "namecode": an integer which can be used to identify the name within the NamePool.
A namecode has 4 bits unused, 8 bits representing the prefix, and 20 bits acting as a pointer to an entry in the namepool containing the local name and namespace URI. Two names are therefore equal if the namecodes are the same in the bottom 20 bits. The value in these 20 bits is also referred to as the fingerprint of the name.
All searching for objects by name is now done by comparing fingerprints; no string comparisons are involved. Fingerprints are used not only for matching names used in XPath expressions to refer to the source document, they are also used for all matching of names within a stylesheet, for example variable names, template names, mode names, key names, and decimal format names.
The name pool is also used for storing namespace declarations: each prefix/URI pair is allocated a namespace code, and all manipulation of namespace nodes in the tree is done using these integer codes.
A consequence of this is that all documents used in a transform must use the same NamePool. This has some implications on the Java API. With simple use of the API, you needn't worry about name pools, they will be taken care of automatically. However, if you are operating a continuously running service in which both source documents and stylesheets are cached in memory, you may need to exercise some care to specify the right NamePool when each document is built.
The model is further complicated by multi-threading. Rather than have synchronization problems with multiple threads updating the same NamePool, the NamePool used to build the stylesheet is copied (imported) into the NamePool used to build the source document, before parsing of the source document starts. When you use the transform() method to parse and transform an InputSource, this happens automatically. However, if you want to build the document yourself, and transform it using transformDocument() (which allows you to run more than one transformation on the same document), then you must manage the NamePool merging yourself. The system does include checks that the NamePools for the stylesheet and source document are compatible, though these are not completely foolproof.
The use of namecodes rather than String names has affected many internal interfaces, and some of these are interfaces that are also exposed externally. For example, the ParameterSet object which is used to pass parameters from a calling template to a called template can also be used to supply global parameters to the Transformer. The parameters in a parameter set are now identified by an integer fingerprint rather than a string name. You can get the integer namecode from the NamePool using the getFingerprint() method; alternatively use the TrAX method addParameter(), which still takes the name as a String.
The Emitter interface has also changed to use name codes; if you have written your own Emitter, the code will have to be modified.
The classes and interfaces used in Saxon for manipulating collections of attributes now implement the SAX2 Attributes interface.
The standard XPath functions have been extensively revised. The main change, apart from tidying up the code, is that the functions are now responsible for evaluating their own arguments, which enables some optimisation, especially when the arguments are node-sets: they can now be evaluated using knowledge of the data type required. For example, the not() function now stops as soon as the first node in the argument node-set is found.
Some of the little-used methods on the NodeInfo interface have been moved as static methods to a separate helper class, com.icl.saxon.om.Navigator. This enables the code of these methods to be independent of the particular tree implementation.
The delayed evaluation of path expressions now works as follows: on the first two occasions that a path expression is evaluated, it navigates the source tree. On the third occasion, it saves the resulting node-set in memory. On subsequent uses, the result is retrieved from memory. This approach is designed to balance time against memory usage.
The optimisation of "//name" as "/descendant::name" (which is possible when there are no predicates) wasn't working in 5.5 (or for a while before that), causing an unnecessary sort. This has been corrected. In addition, the first time "//name" is used for a particular document, the results are now saved, and all subsequent uses of "//name" for the same document retrieve the results from memory. This means that the traditional assumption that "//name" is inefficient may no longer always be true.
A Sequencer class has been introduced for allocating globally-unique sequence numbers. There are two such sequences, one for document numbers, and one for node numbers. By default, two sequencers are created when Saxon is loaded, and remain in use until it is unloaded. However, it is now possible to reset the sequence numbering if required, either to prevent running out of numbers in a long-running server, or to ensure repeatability of the value of generate-id(). The result of generate-id() depends on the document number, and you can restart the sequence of document numbers by calling controller.setDocumentSequencer(new com.icl.saxon.om.Sequencer()). It is the caller's responsibility to ensure that this does not cause two documents that are in use at the same time to have the same number. The node sequence number is used when sorting nodes into document order, and when eliminating duplicates in a union operation. You can similarly allocate a new sequence using controller.setNodeSequencer().
Added an optimization for recursive processing of a node-set: the predicate "[position() > 1]" is now recognized and handled specially, allowing pipelined execution and reducing memory requirements.
Removed getAttributeValue(Name), replaced it with getAttributeValue(String uri, String localName). This is more efficient: in many cases it removes the need to construct the Name object and then take it apart. Attributes can also be found using the integer fingerprint of the name.
The Name class is no longer used for holding expanded names, it now serves merely as a container for a couple of static methods for name validation.
NameTest and its subclasses have been reorganised. There is a new class NodeTest which is a subclass of Pattern; it performs the test on node-type and node-name supporting a node-test in XPath. This test is context-free. As well as replacing the NameTest class, it also replaces NodeTypePattern and NamedNodePattern. The NodeTest is now used on a Step, and on an Axis, replacing the previous combination of a NameTest and a node type. These tests are also used in testing which nodes are candidates for whitespace stripping.
The interface between the Step and Axis classes and the expression parser has been much simplified.
Michael H. Kay
Saxonica Limited
24 November 2005