Changes in this Release

This file describes changes for versions 7.0 and later. For changes prior to version 7.0, see http://saxon.sf.net/saxon6.5.2/changes.html.

Changes in version 7.4 (2003-02-15)

The theme for this release is strong typing. Saxon does not yet support user-defined types, and has very limited facilities for supporting type annotations on nodes in the data model, but it does support most of the built-in atomic types in XML Schema, such as string, integer, decimal, double, date, dateTime, duration, QName, and anyURI. This release introduces the stronger rules for type checking of operations applied to values of these types. If you want strong type checking, set version="2.0" on the stylesheet; if you are more concerned with backwards compatibility, set version="1.0".

Defects cleared

655725 - Java ClassCastException occurs when executing a positional filter expression of the form $x[position() > 1].

655948 - Orphaned namespace declarations can appear when serializing a result tree using method="text".

655950 - The -a option on the command line does not work.

656857 - The xsl:analyze-string instruction does not work correctly when no match is found.

659064 - A ClassCastException can occur when an extension function returns a value of class java.util.List. (The clearance for this introduces a new facility: an extension function can now return a Java List, which will be converted to an XPath sequence, each item in the list being converted from a Java object to an XPath value independently).

680755 - With the XML output method, indentation should be suppressed when the result tree contains xml:space="preserve".

682979 - An UnsupportedOperationException occurs when a stylesheet function is called from the return clause of a for expression.

684487 - Double values less than 0.001 are incorrectly converted to strings: a trailing zero is added (for example, "0.00030").

The following defect has not been fixed, and is carried over until a future release:

655730 - HTML elements in a non-null namespace are serialized as if they were in the null namespace; they should be serialized as XML.

XSLT changes

The context item, position, and size are now unset on entry to a stylesheet function. This means that if the function depends on the context item (or the document containing the context item), then an explicit parameter must be defined. For example, if the function defines <xsl:result select="count(//*)"/>, this must be changed to <xsl:result select="count($x//*)"/>, with $x being supplied as a parameter in the function call. If the function attempts to reference the context item, position, or size, a dynamic error is reported.

This change gives optimization benefits in all cases where functions are NOT dependent on the context.

The value attribute of <xsl:number> is now handled as described in the XSLT 2.0 specification: the value may be a sequence, and the items in the sequence are converted to integers using the casting rules. A recoverable error occurs if casting to an integer is not possible, or if the value is not positive. {numb07, numb20}

A side-effect of this change is that a user-written Numberer must be changed to handle a long argument rather than an int as previously.

The code for assignment of variables and parameters has been changed to use the argument conversion rules defined in XPath (without backwards compatibility mode). This is a weaker conversion than previously: essentially, the select expression must deliver a value of the correct type, it will not be converted to the required type except in the case where the supplied value is an untyped node. This means, for example, that if the required type is xs:integer, you can supply an attribute node, but you cannot supply a string or a double. If a cast to the required type is wanted, it must now be written explicitly in the select expression. The error will be reported statically if the expression could never yield a value of the right type, and will be reported dynamically otherwise.

In a pattern starting key('k', $x)/..., the variable $x can now be multi-valued; the pattern matches if it matches any of the values. {idky30}

The XSLT 1.0 rule that xsl:text elements may not contain child elements has been reinstated. This change anticipates changes to the XSLT 2.0 specification.

In limited circumstances, stylesheet functions (xsl:function) now optimise tail-recursion. The circumstances are that the select expression of the xsl:result instruction must contain a call on the same function in the then or else part of a conditional expression (which may be nested in further conditional expressions). It may require a little care to write functions to exploit this.

The attribute override=yes|no on xsl:function is implemented. This determines whether a user-written stylesheet function should override a vendor-supplied or user-written Java function. The default is "yes". {func19, func21}

This involved taking the function binding logic out of the ExpressionParser and putting it all in the static context. A simpler version of the routine is provided for StandaloneContext. The code has been structured so it can also test for arity (or even static type of arguments) as part of the binding algorithm, but this is not yet exploited.

A side-effect of this change is that it is now possible to call Saxon and EXSLT extension functions when using free-standing XPath expressions, that is, expressions executed using the XPath API from Java, or using saxon:evaluate(). It is also possible to call user-supplied Java extension functions provided the URI maps implicitly to a Java class name (i.e., not relying on saxon:script). {saxon36, 37}

Sorting using xsl:sort: I have changed the way data types are specified (in anticipation of changes to the W3C specification). The data-type attribute may now take values text or number only. The values text and number convert the actual sort key to xs:string or xs:double respectively before doing the comparison. If the data-type attribute is omitted there is no conversion. Values are then compared as supplied; if they are not comparable, a run-time error occurs. If you want to force conversion to a type such as xs:dateTime, use a casting function within the select expression, for example <xsl:sort select="xs:date(@birth-date)"/>. There is a small risk of backwards incompatibility if your stylesheet computes a numeric sort key and doesn't specify a data-type: previously it would have been sorted as a string, it will now be sorted as a number.

I added a check (really a bug fix for 7.3) that when an element is annotated with a simple type such as xs:integer, the element is not allowed to have attributes. The type annotation for an element with attributes must always be a complex type.

XPath changes

The type conversion rules as defined in the XPath 2.0 working draft are now implemented. This means that the rules for passing arguments to functions and operators are now stricter: you can't simply assume that the supplied value will be cast to the required type.

The rules are stricter if the stylesheet specifies version="2.0" than if it specifies version="1.0". The version attribute (or xsl:version on a literal result element) determines whether XPath 1.0 backwards compatibility mode is used for XPath expressions within its scope. With backwards compatibility mode on, certain conversions that were permitted in XPath 1.0 (for example, conversion of anything to a string or number, or extraction of the first item of a sequence) are still permitted under XPath 2.0. With this mode off, these conversions must be done explicitly in the function call. The only conversions that happen automatically under 2.0 are extraction of the value from a node: if the node is untyped (which with Saxon 7.4 will almost invariably be the case), the untyped value is then cast to the type required by the function signature. This means you can supply an untyped attribute as an operand to the "+" oeprator or the round() function, for example, but you cannot supply a string.

Division of two decimals (and also the mod operator) now produces a decimal. Previously it produced a double. The scale of the resulting decimal (the number of digits after the decimal point) is equal to the sum of the scales of the two decimals, plus 6. So, for example, 10.0 div 3 is 3.333333. There have been some other refinements to the xs:decimal implementation. Insignificant trailing zeros are always discarded. Conversion of a decimal to a string uses an integer representation if there is no fractional part, for example the result of 10.0 div 5.0 is diplayed as 2 (which doesn't match the current XPath specification).

Conversion of an empty sequence to a double now produces an empty sequence (not NaN as in XPath 1.0); conversion of any other non-numeric string to a double raises an error (rather than producing NaN). This follows the current XPath 2.0 specification.

I attempted to change the double-to-string conversion to fit the XPath 2.0 rules, but it hit numerous compatibility problems so I have held back on this. But positive and negative infinity are now represented as INF and -INF (previously they were Infinity and -Infinity). The strings INF and -INF are also recognized when casting a string to a double or float; XPath 1.0 had no way of constructing these values except by using a construct such as (1 div 0). {math95, math96}

Functions

The standard functions have been enhanced, where the spec requires it, to return an empty sequence if one of their arguments is an empty sequence.

The max() and min() functions now handle any comparable type, returning the correct data type (for example, if given a set of integers, they return an integer). The default (for untyped nodes) is string comparison. A collation can be specified as the second argument; if not specified, the default collation is used. {expr56, expr57, sort20, sort21, date065, error220}

The implementation of deep-equal(), sequence-deep-equal(), and sequence-node-equal() has been revised to conform to the current specifications.

The distinct-values() function is now implemented, with an optional collation argument. {group017-019}

The functions floor(), round(), ceiling() now return a value of the same type as the supplied argument. {math87-90}

The index-of() function now takes an optional collation argument. {expr87}

For functions that take a collation argument, such as compare(), the default if no collation is specified in the call, and no default <saxon:collation> is supplied, is to use code-point collation. This differs from previous releases, where the default was to use a locale-dependent collation. For xsl:sort, the default is still locale-dependent. This decision is likely to be reviewed in future.

The arguments of sort() are now reversed: the first argument is the sequence to be sorted, the second is the name of the sort key specification.

Changed sum() and avg() to return the same type as supplied. The average of an empty sequence is now (), not NaN. These functions are still confined to handling numeric values, they do not yet work over other summable types such as durations. {math91, math92, expr55, error227}

Corrected a bug: conversion of a double to a float was returning a double!

The functions starts-with(), contains(), and ends-with() now accept a collation name as an optional second argument. {str126-128}

The functions substring-before() and substring-after() now accept a collation name as an optional second argument. {str129-130}

The saxon:distinct() function, with a single argument, is dropped: the functionality is available using either the XPath 2.0 distinct-values() function, or the EXSLT set:distinct() function. The two-argument form (which takes an expression as the second argument) remains.

The dynamic expression supplied to saxon:expression and saxon:evaluate can now contain references to the variables $p1, $p2, ... $p9. The values of these variables can be supplied when the expression is evaluated using saxon:eval or saxon:evaluate respectively. The expression can also contain calls to Java extension functions bound using the implicit mapping of Java classes to namespaces, and to Saxon and EXSLT extension functions. For more details see extensions.html.

The function saxon:get-user-data() has changed to return an empty sequence rather than a zero-length string if no data exists. This is to prevent type-checking problems when the expected value is not a string. {saxon06}

The error() function (with optional argument) is implemented. If the argument is specified, its string-value is used as the error message. {error223-224}

The node-name() function is implemented. It returns a value of type xs:QName.

The functions get-in-scope-namespaces() and get-namespace-uri-for-prefix() are implemented. {nspc45-46}

The unparsed-entity-public-id() function (defined in XSLT 2.0) is implemented. This required a minor change to the DocumentInfo interface implemented by the tree model. {expr88}

The unordered() function is implemented. This returns the results of a sequence in implementation-defined order. In practice the only important case where it has any effect in the Saxon implementation is where the sequence supplied as argument is a Step using a reverse axis: for example, unordered(ancestor::*) returns the ancestors in reverse document order. But applications should not rely on the actual order; the function is intended to be used by applications that do not care about the order of the results. {axes60}

A simple implementation of the input() function is available. If the parameter {http://saxon.sf.net/}input has been supplied to the transformation, it returns the value of this parameter. This must be a node sequence &emdash; which means it cannot be supplied from the command line. If no such parameter has been supplied, it returns the root of the principal source document (the document containing the node that was matched on entry to the transformation). {Limited testing only: mdocs09}

The default-collation() function is implemented: it returns the URI of the default collation if specified, or the URI of the code-point collation otherwise.

The component extraction functions for durations are now available only on the two subtypes yearMonthDuration and dayTimeDuration, and are named accordingly: for example get-years-from-yearMonthDuration and get-hours-from-dayTimeDuration. (But other operations remain available on durations even though not specified in the current working draft, for example equality comparison on durations). {date055-058}

Casts and constructor functions, when converting from a string to another type, now apply the appropriate whitespace normalization to the supplied value, as defined in the whitespace facet for the target data type. This means, for example, that an ID value can have leading and trailing spaces, which are ignored. {type035}

Miscellaneous

A new attribute saxon:explain is available on any instruction in the stylesheet. The permitted values are "yes" and "no". If the value is yes, then at compile-time, Saxon outputs an analysis of all XPath expressions appearing in attributes of that instruction. The analysis includes the static type of the expression, and a representation of the optimized expression tree. For some examples, see extensions.html.

A new command line option is available. The -TJ option traces the binding of external calls to Java functions. It is useful when analyzing why Saxon fails to find a Java method to match an extension function call in the stylesheet, or why it chooses one method over another when several are available. The option is also available programmatically via the TRACE_EXTERNAL_FUNCTIONS attribute in the TransformerFactory.

The URI http://www.w3.org/2002/11/query-operators/collation/codepoint is now recognized as the name of the code-point collation; if this URI is specified in calls to sorting or comparison operations, strings will be compared according to their Unicode code-points. Note that this URI is likely to change in subsequent versions of the XPath working drafts.

There has been a change to the mechanism whereby a ContentHandler that is nominated to handle the result tree can indicate that it is prepared to handle output that is well-balanced, but not well-formed (for example, if it contains more than one element node as a child of the document root). A new attribute saxon:require-well-formed is available on xsl:output, with values "yes" or "no". The default is "no". If the value is set to "yes", and a user-written ContentHandler is supplied to receive the results of the transformation, then Saxon will report an error rather than sending a non-well-formed stream of SAX events to the ContentHandler. {saxon72, error231}

The XML output method now outputs a tab character appearing in an attribute value as 	, to prevent it being normalized when the document is re-parsed.{type035}

The XPath API has been extended with a method that allows the sort order for the results of an expression to be specified. Previously, there was no way of sorting sequences except in an XSLT context, or by use of rather complex internal Saxon classes. Expressions generated using the XPath API can also now be used directly in conjunction with the applyTemplates mechanism in the Controller, for use by applications doing rule-based processing from Java. The getExpression method of XPathExpression, which provided an escape from the packaged XPath API into the internal Saxon interfaces, has been replaced by rawIterator(), which fulfils the same purpose.

A Java extension function can now return a java.util.List object to represent a sequence. Each item in the list is converted to an XPath value (node or atomic value) as if it were returned from a separate function call: for example, a List containing three Java Integers is converted to an XPath sequence of three xs:int values. {func25}

The interface that handles the setting of variables accessed by the XPath engine from its environment has been generalized to remove the assumption that the variables are defined in XSLT. This means that variables can now be used in the freestanding XPath API, as well as in saxon:expression and saxon:evaluate.

I have removed the restriction that a URIResolver for the URI contained in the href pseudo-attribute of the <?xml-stylesheet?> processing instruction must return a SAXSource. It can now return any kind of Source. This change has been regression tested only. The (existing) code for creating a composite stylesheet when there are several <?xml-stylesheet?> processing instructions, (as specified in the JAXP interface definition) is not tested by any of my standard tests, but I have left it in. From an inspection of the code, I don't think it will work if the URIs are relative.

Internal Changes

These changes should not affect users unless you exploit internal interfaces within Saxon.

Parameters to stylesheet functions are now passed by position (in an array of values), not by name.

Internally, there has been a change to the processing of literal result elements. XPath expressions contained within attribute value templates on such an element are now processed during the first (prepareAttributes) compilation phase, as with other stylesheet instructions. Type checking happens during the second (validate) phase. A consequence of this change is that user-defined top-level elements are now represented by a different class, DataElement, to prevent their attributes being processed as AVTs.

Changes made in support of XPath type-checking include the following:

The general trend is towards doing more of the work at compile time. Where type conversions are necessary, or where it is determined statically that they might be necessary, then the conversions are compiled into the executable expression; if they are not necessary, they are not performed. Similarly, if dynamic type checking is necessary, then it is compiled into the expression; otherwise, it is not performed.
Function calls to standard functions are now compiled with knowledge of the signature of the function. The code generated is conditional on whether backwards compatible mode is enabled or not. If the supplied arguments are incompatible with the function signature (that is, if the call cannot possibly succeed) then a static type error is generated. Code to atomize nodes and perform other allowed conversions (e.g. numeric promotion) is compiled into the expression tree. If the supplied value cannot be statically guaranteed to be of the correct type, then type-checking code is generated in the expression tree.
The same logic is used for calls to stylesheet functions. In this case, backwards compatible mode is never used, which means there is no implicit conversion of arguments. Calls to stylesheet functions are now statically checked; this is done by means of a fixup process that allows for the fact that the function call can be parsed before the function declaration is encountered.
The same logic is used for evaluating keys.
Within the implementation of standard functions, arguments are now evaluated without any type conversion: any conversions that are performed are done by the function calling mechanism, using internal tables that represent the signatures of each function.
The internal Expression#evaluate() method has been dropped. All implementations and usages of this function have changed to use evaluateItem() or iterate() (or in some cases, lazyEvaluate()), as appropriate.
The code for value comparisons and general comparisons has been split into a number separate classes. These do stricter type checking of their arguments. The decision which algorithm to use (hash join, etc) is now made at compile time, using static information about the types and cardinality of the arguments. But the conversion of untypedAtomic values (which result from atomizing a node with no type annotation) to a string or double (depending on the type of the other argument) is done dynamically. In the final stages of testing I found a design problem in this area: neither the new code nor the code in previous releases handled comparisons such as (U, U, U) = (1, 2, '3') correctly, where U is an untyped value. The problem here is that a mixture of string and numeric comparisons is required. I fixed this for the time being by changing the code so it always does a naive nested-loop comparison. This doesn't appear to have a noticable effect on performance in most cases: there will be some cases where it is very inefficient, but these don't arise very often.
Other classes, notably the code for arithmetic expressions, also do stricter type checking.
The code for attribute value templates has been reorganized. The AttributeValueTemplate class is now used only at compile time, and it has therefore been moved to the style package. It no longer acts as a pseudo-XPath expression; instead, compiling the AVT generates a true XPath expression, including calls to concat(), string-join(), and string() where required. These handle all necessary type conversions.
The Expression#evaluateAsString() method no longer does conversion of the expression result to a string; the method should only be used where (a) the expression is statically known to return a string or (), and (b) the returned value of () is treated as equivalent to "". In practice, this means that the use of the method is now largely confined to the evaluation of attribute value templates. This method will probably be phased out.
The code for xsl:value-of has changed so it now compiles any code needed to convert the supplied expression to a string (or, if the separator attribute is present, a sequence of strings)
The code for xsl:sort has changed so that the sort key is converted to the required type using the same rules as the rules for function arguments. Internally, a new class FixedSortKeyDefinition is introduced to represent a sort key definition that contains no context dependencies, that is, one in which the values of all the parameters such as order, case-order, language, and data-type are known. Sometimes it is possible to create this statically, sometimes (when AVTs are used) it cannot be created until the values of variables are known.
Those Saxon extension functions that need special treatment at compile time (specifically, saxon:evaluate, saxon:expression, saxon:parse, and saxon:serialize), are now treated in the same way as system functions.
The class SimpleValue has been renamed AtomicValue.
The method convert() is now available only on the AtomicValue class, it is not available for all values as previously. This method implements the logic of the casting rules.
Expressions are now parsed in three stages: parsing, context-independent rewriting, and static type analysis. The first stage is done by the ExpressionParser class, the second by calling the simplify() method on the resulting Expression object. The third stage is done by calling the typeCheck() method on the Expression object. In an XSLT context, type information for stylesheet variables and stylesheet functions is added before the typeCheck() method is called. The Expression.make() call only does the first two steps; applications that use this interface must be changed to call typeCheck() as well. The XPath API in package net.sf.saxon.xpath works unchanged.
Higher-order expressions, such as path expressions, filter expressions, and "for", "some", and "every" expressions, are now rewritten statically to promote any subexpressions that don't depend on the iteration variables. The effect is that such subexpressions are only evaluated once. This mechanism replaces the previous run-time optimisation based on the concept of expression reduction (at run-time, the expression was replaced with an expression in which the independent sub-expressions were replaced with their value). The new mechanism is done entirely at compile time and is therefore much more economical. Also it avoids doing trivial rewrites, that is, extracting constants and simple variable references.{opt001-004}
Run-time expression reduction is still used to eliminate context dependencies in an expression that is being evaluated lazily (always an expression that returns a sequence), and is being held as the value of a variable. When evaluation of such an expression is deferred, it is necessary to make a copy of all aspects of the context that it depends on, and this is done by rewriting the expression with a new expression in which all context variables are replaced with their values.

Changes in version 7.3.1 (2002-12-10)

Defects cleared

641793 - <xsl:element name="{local-name()}"/> fails (with the message "namespace prefix has not been declared").

641940 - No diagnostics for Saxon internal errors when using the Crimson parser

641948 - NullPointerException when two xsl:strip-space or xsl:preserve-space declarations name the same element

645190 - <xsl:namespace> rejects a zero-length string as the name, and fails to detect a conflict with the namespace of the containing element

646844 - The <saxon:query> extension element throws a NullPointerException if the columns attribute is omitted

Performance improvements

The code for <xsl:number level="any"> has been optimized. The conditions that must apply for this optimization are that the count and from patterns must not contain any variable references; either the count pattern must be specified, or the name and type of the context node must be statically obtainable, typically from the match pattern of the containing template rule. Under these circumstances, Saxon will remember the result of evaluating the instruction, and the next time it is evaluated, if the first node encountered in its reverse scan of the document is the one that was most recently numbered, it will simply add one to the remembered number.

Note: a similar optimization for <xsl:number> with no attributes has been in use for some time.

The numbering code also now passes down the required node name and kind to the axis iterator that is used to find the nodes being counted. This reduces the overhead of skipping non-matching nodes.

The code for navigating the parent and ancestor axes in the TinyTree implementation has been improved. The next pointer for the last sibling now points to the parent node; this is distinguished from a normal next pointer by virtue of the fact that it always points backwards in the node array. When a search is needed to find the parent, this is now done by reading the next-sibling pointer chain until the owner is found, which in general is faster than the previous technique, of scanning all preceding nodes until one is found whose depth is lower.

This was done after exploring a number of alternative approaches, none of which led to significant performance improvements. In particular, I tried various ways of remembering the parent node during a scan of the descendant axis, but in all cases the benefit achieved when the parent node was actually used was less than the extra cost of maintaining the information in the cases where it wasn't needed.

Note: I have not identified any circumstances in which the "standard" tree implementation out-performs the TinyTree. The "standard" implementation is retained, however, because it is used for stylesheets during compilation.

A small improvement has been made to the code for evaluating an attribute reference.

Various expression classes now contain their own implementations of the effectiveBooleanValue() method, avoiding the need to use the general-purpose logic in cases where the value is already known to be a singleton.

My main test case for these performance improvements was the stylesheet used to render the XSLT 2.0 specification. The execution time for this stylesheet improved from 16.4 seconds to 7.6 seconds. Improvements for other stylesheets are very unlikely to be as high as this. Another test case improved from 66.8 to 59.8 seconds, which is probably more typical.

Changes in version 7.3 (2002-11-18)

References in curly braces identify the test cases used to test each new feature.

Defects cleared

603928 - [position()!=last()] in a pattern fails.

607442 - unparsed-text() function fails.

608416 - command line processor calls System.exit().

616543 - xsl:param needn't come first in a template.

616548 - multithreading bug when using saxon:preview declaration.

617103 - ./EXPR doesn't sort results into document order.

620851 - precedence of conflicting xsl:namespace-alias declarations.

626277 - exlst:leading() and exslt:trailing with empty node-set arguments.

635433 - SQLInsert attempts commit() even when autocommit is set.

636661 - interaction of cdata-section-elements with disable-output-escaping.

637117 - creating two namespaces with same prefix and different URI.

637292 - xsl:for-each and xsl:for-each-group don't nullify the current template rule.

XSLT changes

The xsl:principal-result-document element is withdrawn. Note, however, that the ability to NOT have a principal result tree is not yet available. A principal output file will be created even if it is empty.

Attributes of xsl:output can no longer be attribute value templates.

A new attribute has been added to xsl:output: saxon:byte-order-mark="yes" causes a byte order mark (hex FEFF) to be inserted at the start of the output file. This is most useful with UTF-8 and UTF-16 encoding, as some text editors recognize it, but it is available for use with any output method and any encoding. {outp70}

The saxon:omit-meta-tag attribute in xsl:output has been replaced with the new (standard) include-content-type attribute. Note that this works the other way around: replace saxon:omit-meta-tag="yes" by include-content-type="no". {saxon47, outp71}

The new (standard) escape-uri-attributes attribute in xsl:output has been implemented for the HTML output method. (URI escaping is not yet implemented for method="xhtml"). {outp71}

The built-in template rules now pass parameters through unchanged to the templates for their child elements. This applies whether the rule is called because of xsl:apply-templates or xsl:apply-imports {cnfr21, cnfr22}.

The base URI of the root of a temporary tree is now taken from the base URI of the xsl:variable element in the stylesheet. Previously it was taken from the system ID. There is a difference in the case where xml:base is used. {not tested}

A variable that is never referenced will no longer be evaluated. This can cause problems if evaluation of the variable has side-effects (e.g. by calling an extension function, or saxon:assign). You can force evaluation of the variable by setting saxon:assignable="yes".

The terminate attribute of xsl:message may now be an attribute value template. {ver14}

The copy-namespaces attribute of xsl:copy and xsl:copy-of is now supported. {copy12-13}

The type attribute of xsl:variable, xsl:param, and xsl:result is renamed as.

The as and collation attributes of xsl:key are now supported. This allows indexing of nodes by numeric or date values, and matching using case-blind or accent-blind comparisons. {idky25-29}.

The type-annotation attribute of xsl:attribute and xsl:element and the xsl:type-annotation of literal result elements are supported. For example, you can now annotate attributes of elements on a temporary tree as type-annotation="xs:ID", and then use the id() function to find them, using an expression such as $tree/id('A001'). The actual value of the element or attribute must be valid according the the type given in the type annotation. Only built-in schema-defined types are currently supported. Attribute types derived from a DTD will be recorded if they are reported by the parser (but CDATA is treated as untyped, and the list types IDREFS, ENTITIES, and NMTOKENS are not yet supported). Although the values must be valid according to their type, there are no checks on uniqueness constraints (ID) or referential integrity constraints (IDREF). {schema001-4}

The type annotations are retained on the tree only if the attribute type-information="preserve" is present. If the attribute is absent, or is set to none, the type-annotation on any elements or attributes in the tree will still be used to validate the content, but will not result in any annotation of the nodes on the tree. The values strict and lax for this attribute are not yet implemented. {schema005-6}

The type-information attribute is also available on xsl:result-document. It only affects the outcome if the result tree is captured using a user-written Receiver in which the annotations will be available. At present the type annotations are NOT retained if the result is fed into another stylesheet using saxon:next-in-chain: this is because the chaining goes via a SAX2 ContentHandler which cannot pass the type information through. {schema007, schema013}

The attribute copy-type-annotations is available on xsl:copy-of. The default is "no", which means that type annotations are NOT copied from the source tree to the result tree. {schema011-012}

The effect of xsl:namespace-alias has been changed. Elements and attributes whose namespace is changed by an xsl:namespace-alias declaration will now take the prefix given in the result-prefix attribute, where possible. Previously they took the new namespace URI but retained their original prefix. This was technically conformant with the specification, but untidy, and it often led to the result document containing multiple declarations of the same namespace URI. {nspc36-38}

Conflicting xsl:namespace-alias declarations are now reported as a static error. {error007}

XPath changes

The precedence of different expressions in the XPath grammar has been aligned with the August 2002 working draft. This meant making a few changes: range expressions (such as 1 to 10) now bind more tightly than conditional expressions; all comparison operators now have the same precedence, and consecutive operators (as in a = b = c) are not allowed; unary minus binds more tightly than union; cast and treat expressions are no longer allowed as steps in a path expression. Saxon implements the full XPath 2.0 grammar with the exceptions of the validate expression and schema-related aspects of the SequenceType production.

Removed the ability to do a "mapping cast", that is, to cast a sequence as a sequence. This functionality went beyond the semantics of cast as defined in the XPath 2.0 specification. The argument and result of a cast must now be a singleton, and if the input is an empty sequence, the output is an empty sequence. The actual conversion rules still need some work to align them fully with the evolving XPath specification.

Implemented the escape-uri() function. The '#' character is treated as a reserved character, in addition to those listed in the specification. {expr85}

Implemented the item-at() function, but with restrictions: if the subscript is out of range, it should raise an error, but it currently returns the empty sequence. {pos65}

Implemented the data() function. {schema008, 009, 010, 012}

The concept of "effective boolean value" has been implemented. This algorithm is now used when converting any value to a boolean in contexts such as conditional expressions, filter predicates, and the boolean() function. It is fully backwards compatible with XPath 1.0.

A different, more restricted algorithm is used when casting values to booleans using a cast expression or the xs:boolean() constructor: for strings in particular, the effective boolean value gives false for a zero-length string and true for any other string, while xs:boolean() (in line with W3C Schema) gives true for "1" or "true", false for "0" or "false", and an error for any other string. {type034} xs:boolean() changes are not yet complete for supplied values other than string.

The algorithm for "atomize" is also available for all expressions, though at present it is used only for the argument of a cast. It is also simpler than the algorithm described in the specification because at present the typed value of a node is always the same as the string value.

Changed the EXSLT set:leading() and set:trailing() functions (as required by the spec) so that if the second argument is empty, the first argument is returned. Changed saxon:before() and saxon:after() so they work the same way. Previously, the empty node-set was returned. This change will be retrofitted to 6.5.x. There is a further deviation from the spec: If no node in the second node-set is present in the first node-set, Saxon returns all nodes before/after the first/last in the second node-set, whereas the spec requires it to return an empty sequence. This would require a redesign, and it prevents a pipelined implementation, so I don't intend to implement this change.

Implemented the string-to-codepoints() and codepoints-to-string() functions, replacing saxon:string-to-unicode() and saxon:unicode-to-string(). {saxon68-69}

Implemented the string-join() function. {str125}

Implemented the castable as operator. {type030}

Implemented the types xs:anyURI and xs:QName, and the functions expanded-QName(), get-local-name-from-QName(), get-namespace-from-QName() {type031-33}

Implemented the SequenceType grammar for "attribute of type T" and "element of type T". T must be a built-in simple type. {schema002-004, 014; error009, 012}.

The second argument of saxon:serialize() must now be known at compile-time. This is because details of xsl:output declarations are not available at run-time unless they are actually referenced.

The results of the function-available() and element-available() functions may be inaccurate if the argument is not known at compile-time. Specifically, only system-defined functions and instructions are known at run-time. In practice, these functions are designed to perform compile-time tests so this is very unlikely to be a problem. There is also some justification in that the only functions that can be called dynamically (using saxon:evaluate()) are system-defined functions.

As a result of the changes affecting stylesheet compilation, there are some new restrictions on the extension function saxon:evaluate() (and also saxon:expression()). In particular, the dynamically constructed expression can no longer reference any XSLT variables, and it cannot access any stylesheet functions, Saxon extension functions, or XSLT-specific functions such as key() and generate-id().

Stylesheet Compilation

There has been a substantial change to the way stylesheets are "compiled". In previous releases, the compiled stylesheet was actually a standard tree representation of the source XML stylesheet, with annotations on the nodes to assist efficient execution. In this release, the tree representation of the stylesheet is discarded once compilation is complete, and a custom data structure is used to represent the executable stylesheet.

The compiled stylesheet may now be serialized (using Java serialization), enabling it to be saved on disk, or transferred between machines - this is especially useful in an Enterprise Java Beans environment. A new command java net.sf.saxon.Compile stylesheet output is available to compile a stylesheet, and the java net.sf.saxon.Transform command has a new option -c which causes the stylesheet parameter to be taken as a compiled stylesheet rather than a source stylesheet. In fact, using compiled stylesheets from the command line does not give a great performance advantage over recompiling them each time they are used, because the compilation time is dominated by Java initialization; the benefits are more likely to be realized in a high-throughput server-based environment, where it is now possible to use disk caching of stylesheets as an alternative to in-memory caching.

These changes bring (or promise) a number of benefits:

The compiled stylesheet is significantly smaller, important when a number of compiled stylesheets are cached in a web server.
It is possible to distribute a stylesheet in scrambled form, so that users cannot easily make changes.
Unused parts of the stylesheet, for example template rules in imported modules, are discarded.
The compiled stylesheet is relocatable between servers (e.g. under EJB).
Stylesheet optimizations, by rewriting the tree, become feasible. Until now the Saxon optimizer has only operated at the level of individual XPath expressions. A few simple optimizations have been implemented in this release, e.g. the decision whether to execute xsl:fallback is made entirely at compile-time.

The main drawback is that less of the static context is available during execution. This makes a number of things more difficult, or in some cases impossible:

Diagnostics, such as tracing and debugging, have less information available. For example, variable names are currently not retained in the executable.
Reflexive capabilities become more difficult. The obvious examples are saxon:evaluate and the saxon:allow-avt attribute which allows dyanamic selection of a template in xsl:call-template

In general I expect that stylesheets will need to be recompiled whenever a new Saxon version is issued, though this may be avoidable the case of a bug-clearance release.

Stylesheet compilation is a little fragile at this release. It has proved difficult to test it comprehensively. One known restriction is that stylesheets containing saxon:collation declarations cannot be compiled (because it uses Java classes that are not serializable). There may be other restrictions: please let me know if you find any.

As part of this change, the stylesheet tree now uses a different NamePool from the source tree. This NamePool is discarded as soon as compilation is complete. Names used in XPath expressions, names of literal result elements and attributes, and names of keys, variables, templates, and functions, are still registered in the NamePool for the source document, but the names of XSLT elements and attributes (e.g. xsl:template, select) no longer appear. This significantly reduces the size of the compiled version of a small stylesheet, and makes loading of the compiled stylesheet correspondingly faster. It also means that names used in the source document are less likely to encounter hashing conflicts in the NamePool, giving a small run-time speed-up.

There have been a number of changes to APIs that may affect users.

The XSLTContext object has been merged into net.sf.saxon.Controller. This class was exposed in the traditional Saxon Java event-handling API, and was also available for use by extension functions, extension elements, and trace listeners. Extension functions that require context information must now declare a first argument of class XPathContext.
The tracing API has changed, as the execution flow can no longer be described in terms of nodes in the stylesheet tree. Since it had to change anyway, I have taken the opportunity to redesign it in terms of interfaces that hopefully will stand the test of time.
The element extensibility API is changed because extension elements, like other nodes in the stylesheet, must be compiled into separate data structure for execution. The ExtensionElementFactory interface is unaffected, but the classes implementing individual instructions must be split in two: a subclass of StyleElement representing the node on the stylesheet tree, which contains a compile() method, which in turn generates a corresponding instance of a subtype of the class net.sf.saxon.instruct.Instruction. The SQL extension library has been updated to show how the new scheme works.
The NodeHandler interface (used by the old "non-XSLT" Java interface to Saxon) has changed again. Its start() method is replaced by a process() method that takes a single argument, the Controller object.

Internal Changes

Emitter and Receiver classes

I have introduced a new interface, net.sf.saxon.event.Receiver, which is intended to replace the old Emitter interface. This supports setting type annotations on element and attribute nodes: it allows the type information to be carried with the element and attribute events, and also allows various properties to be associated with each event, used for disable-output-escaping and to indicate when validation has already been done so it does not get done twice.

The classes that implement this interface are largely in package net.sf.saxon.event, which replaces the old net.sf.saxon.output package.

The new interface largely replaces the Outputter interface, ending the artificial distinction between the Outputter and the Emitter, which was there historically because events were handled in a different order at the two interfaces.

The "sticky d-o-e" facility is not working in this release: that is, an error is reported when output is written to a temporary tree with disable-output-escaping="yes". The same happens if the final output of the stylesheet is written to a Saxon tree, for example when using a Saxon-created DOMResult, or when using stylesheet chaining. It is possible that "sticky d-o-e" will not be allowed in the final XSLT 2.0 specification, though at present there are open issues concerning this. {outp09, bug17}

Internal XPath Changes

The class hierarchy for XPath expressions (net.sf.saxon.expr.Expression) has been simplified. The two abstract classes SingleValueExpression and SequenceExpression have disappeared; their functionality has moved into the parent class, Expression, driven by the static cardinality of the expression as determined by the getCardinality() method. This allows greater re-use of classes such as BinaryExpression. There is potential for many expressions to be implemented as functions, allowing more use of generic code and table-driven static analysis.

The implementation of SequenceExpressions of the form (1,2,3) has changed completely, and is much simpler. They are now handled by breaking them up into a tree of binary expressions, treating "," as a list concatenation operator. {expr53, 54, 55, 86}

The implementation of FilterExpressions has been rewritten and simplified. Two different iterators are now used, a FilterIterator where every value needs to be tested, and a PositionIterator where the value is known statically to be numeric. This greatly simplifies the code. The way in which reverse axes are handled has also been simplified.

I want to move away from run-time expression reduction on filters to doing a static rewrite that pulls non-dependent subexpressions out of the predicate, but this has not yet been done.

The class XPathException is now abstract. There are two concrete subclasses, XPathException.Static and XPathException.Dynamic, used to distinguish static from dynamic errors. (Other subtypes, for example XPathException.TypeError may be introduced in future. A dynamic error that occurs when an XPath expression is evaluated early (at compile time) is now not reported until run-time, and is only reported if the expression is actually evaluated.

Other changes

I have decided to drop the integration with Apache's FOP processor. The API has changed yet again between FOP 0.20.3 and FOP 0.20.4. It is simply too much hassle to keep chasing a moving target, especially as the changes are not well documented and impossible to make without studying the FOP source code.

It is now possible to control the use of NamePools via the TransformerFactory. The call factory.setAttribute(FeatureKeys.NAME_POOL, pool) causes the specified namepool to be used by all stylesheets that are compiled (using newTemplates()) following this call. Note: unless you really know what you are doing, it is safest to let Saxon manage the namePools automatically.

The HTML output method now uses its own internal method for URI escaping, rather than relying on the utf8 encoding available in the Java IO library. {outp52, 57}

Support for SAX1 XML parsers is withdrawn. All mainstream parsers support SAX2, with the possible exception of James Clark's xp. Similarly, output will no longer be directed to a SAX1 DocumentHandler: you must supply a SAX2 ContentHandler instead. Saxon now compiles without any deprecation warnings.

Changes in version 7.2 (2002-08-28)

Installation

Saxon 7.2 requires Java JDK 1.4

Saxon 7.2 requires JDK 1.4. This is primarily to support the use of regular expressions: Saxon now uses the JDK 1.4 regular expression library to support xsl:analyze-string and the functions matches(), replace(), and tokenize().

Since JDK 1.4 includes an XML parser, there is no longer any good reason for Saxon to supply its own XML parser. Therefore AElfred is no longer included in the Saxon package, and the default will be to use the Crimson parser (or whatever is included in the JDK 1.4 distribution).

Note: JDK 1.4 appears to require more [or allocate less] stack space than JDK 1.3, some transformations that ran successfully in JDK 1.3 run out of stack space with JDK 1.4. This equally affects earlier Saxon releases when running with JDK 1.4

Defects Cleared

542981: Saxon fails with JDOM beta 0.8.

553347: The context node is not reset correctly after a stylesheet function is called from within an XPath predicate.

558696: Cannot include a simplified stylesheet.

561695: Error message "more than one method matches" when calling a Java method that accepts argument of class Object.

573314: The expression string-length($x)=0 gives wrong result for "0" and "false".

576632: Match on parent node in a pattern fails.

580989: NullPointerException when tracing using -T option.

581515: Duplicate DOCTYPE declaration when using an identity transformer and HTML serialization.

583939: Memory leak when using keys.

584944: Attribute value templates on <xsl:sort> cannot depend on the context node.

XSLT changes

Implemented the xsl:analyze-string instruction, which supports regular expression matching.

Where an embedded expression within an attribute value template yields a sequence of more than one item, the string values of all the items are now output, separated by spaces. This is incompatible with XSLT 1.0, which ignored all but the first node in a node-set. If this causes compatibility problems (a) you can fix it by using the filter [1] after the expression, (b) please let me know: the XSL WG wants to know whether this incompatible change is likely to cause problems in practice.

The elements xsl:variable, xsl:param, and xsl:result may now take a type attribute indicating the required type of the value. The supplied value will be converted to this type if necessary. The value of the attribute is the same subset of the XPath SequenceType production as is implemented for "cast as" and "instance of" expressions: basically, the fixed types such as "item" and "element" and the built-in types such as xs:string and xs:date, followed by an optional occurrence indicator.

Parameters to xsl:function may no longer specify a default value: all arguments must be supplied in the function call.

An xsl:message instruction may now appear inside an xsl:function.

The xsl:text instruction may now contain other instructions, such as xsl:value-of. Pending resolution of issue 132 in the spec, avoid using disable-output-escaping with nested xsl:text elements. The effect is unlikely to be what you expected..

It is now an error to specify the mode or priority attributes on an xsl:template element with no match attribute.

Match patterns using the id() and key() functions can now reference global variables or parameters for the value of the id or key.

The attributes version, exclude-result-prefixes, and extension-element-prefixes may now appear on any element in the XSLT namespace. Note that these attributes are prefixed xsl: when used on a literal result element, but have no prefix when used on an XSLT element.

The attribute [xsl:]default-xpath-namespace is now available on all elements. It defines the defualt namespace to be used for unprefixed element names in path expressions and patterns.

The xsl:apply-templates element now allows mode="#current" and mode="#default". The xsl:template allows the mode attribute to be a list of mode names, optionally including #default to match the default mode.

The disable-output-escaping attribute of xsl:attribute is implemented, replacing the saxon:disable-output-escaping extension, which is no longer available.

The xsl:destination element is renamed xsl:principal-result-document. (This was misdocumented in version 7.1).

Implemented the unparsed-text() function (with the second argument being mandatory).

General

Added a -v option to the command line to request XML validation. This applies to the principal source document and other files read using the document() function. It requires an XML parser that supports validation.

The same feature is available in the API using setFeature(FeatureKeys.VALIDATION, Boolean.TRUE) on the TransformerFactory.

Added a getTransformer() method to the net.sf.saxon.Filter class that is created in response to SAXTransformerFactory#newXMLFilter(). This allows setting of stylesheet parameters, a URIResolver, etc, when using this interface. Not tested.

The standard TraceListener now outputs an abbreviated version of the file name of the stylesheet module containing an instruction, as well as the line number.

The indentation algorithm for method="xml" has been changed so no extra whitespace is output if there is already enough whitespace in the result tree: specifically, if a start tag is preceded by a newline and as many spaces as the indentation would output, then no extra indentation takes place. The effect is to avoid adding blank lines when copying XML that is already indented. This change does not affect method="html", because the HTML indentation rules are more complex and can easily affect the appearance of text in the browser if applied wrongly.

XPath changes

Implemented the regular expression functions matches(), replace() and tokenize() as defined in the Functions and Operators specification; also the regex-group() function defined in the XSLT 2.0 WD.

The only option in the construct A instance of [only] B has been removed, as it is no longer defined in the XPath WD.

Changed the rules for the context document: this is now always the document containing the context node. If the context item is not a node, there is no context document, and any absolute path expression (or calls on id(), key(), or unparsed-entity-uri()) will cause a dynamic error.

Implemented the time data type, the constructor xs:time(), and the functions current-date() and current-time(). Time values can be compared for equality or ordering, and can be sorted.

Implemented the component extraction functions, get-x-from-y, for date, dateTime, and time.

Changed DateTime and time classes so that the timezone is retained as part of the value. Equality and ordering is done by normalizing the time to UTC, but conversion to a string, and extraction of components, reflects the timezone as originally specified.

Constructor functions such as dateTime() have been moved to the schema namespace (you can use either "http://www.w3.org/2001/XMLSchema" (conventional prefix xs) or "http://www.w3.org/2001/XMLSchema-datatypes" (conventional prefix xsd). Stylesheets that use these constructor functions must be changed. The semantics of these constructor are identical to the cast expression.

Added the duration data-type, including conversion to and from strings, comparison for equality and ordering, sorting, and component extraction. (This goes beyond the XPath 2.0 drafts, which do not allow ordering on durations.) Ordering is based on the average length of a month (one year = 365.25... days): so P365D < P1Y and P366D > P1Y. Component extraction works on any kind of duration, and the functions are currently named get-X-from-duration(), not get-X-from-yearMonthDuration() or get-X-from-dayTimeDuration(). Arithmetic involving durations or dates is not yet implemented.

Added the two XPath-defined subtypes of duration: xs:dayTimeDuration and xs:yearMonthDuration. Implemented the functions to construct these from a number of months or seconds. The "+" and "-" operators can be used to add two durations of the same type, and the "*" and "div" operators to multiply or divide a duration by a number.

Added the subtypes of xs:integer (xs:long, xs:int, xs:short and the rest). The type promotion rules for comparison and arithmetic on numeric types have been brought into line with the specification, though there are probably still a few minor discrepancies (especially where fallback conversions from strings are involved).

Added the idiv operator for integer division. For example, 10 idiv 3 is 3. The div operator always returns a double result.

Added the subtypes of xs:string (token, language, Name, NCName, ID, IDREF, ENTITY, NMTOKEN); but not the list types IDREF, ENTITIES, NMTOKENS. These have no useful functionality beyond the ability to validate the lexical rules for each type.

Implemented the distinct-nodes() function (at the same time fixing a bug in union, intersect and except when supplied with arguments that are not in document order).

Implemented the deep-equal() function. Because nodes are still untyped, it compares string values of text nodes rather than typed values. Not yet tested with an explicit collation.

Renamed the sublist() function as subsequence().

Implemented the sequence-node-equal() and sequence-deep-equal() functions. Not yet tested with an explicit collation.

Implemented the functions node-kind(), root(), context-item().

Added the EXSLT functions in package math: abs, acos, asin, atan, atan2, constant, cos, exp, log, power, random, sin, sqrt, tan. Thanks to Simon St. Laurent for these. Only partially tested.

The saxon:intersection and saxon:difference extension functions have been dropped; instead use either the XPath 2.0 operators (intersect, except) or the EXSLT functions.

Performance

Many internal iterators work with a one-item lookahead. This is wasteful if the iteration is not continued to completion, which happens for example with a numeric predicate such as expr[1], or with an existential comparison such as sequenceA = sequenceB, or when converting a sequence to a string or a boolean. This lookahead has been removed for some commonly used iterations, notably the FilterIterator, the MappingIterator, and the TinyTree SiblingIterator. A consequence is that the hasNext() method of SequenceIterator can now throw an XPathException.

Deferred evaluation of variables happened in the past when the expression was a SequenceExpression. It now happens only if the compile-time cardinality of the expression allows more than one item. This means that deferred evaluation will not be used for an expression of the form expr[1]. And when deferred evaluation is used, the iterator is not primed by calling hasNext(): this means that (for an iterator that doesn't do lookahead), the search for the first item is now deferred until the variable is first used, and doesn't have to be repeated unnecessarily. In addition, if the variable is referenced in a context where only the first item in the sequence is required (e.g. to get the value as a boolean or as a string), the value is now saved without evaluating the full sequence.

I have added an optimization for constructs of the form <xsl:if test="a | b">. Where a union expression is evaluated in a boolean context it is now treated as if the operator were "or". This potentially avoids the need to sort the two node-sets into document order.

There are some changes in the way global variables are handled. At compile time, a hash table is used in place of linear searching to search for duplicates: this should improve compilation performance for stylesheets with many global variables, especially when many of the variables are overridden by an importing stylesheet. At run-time, evaluation of global variables is now deferred until the first reference to the variable, which will improve execution performance when there are global variables that are never referenced. Note that this change will be visible if <xsl:message> is used to trace execution.

A filter expression of the form f[a and b] is now rewritten as f[a][b] when appropriate, to enable an early exit in the case where a is positional: for example item[position() = 1 and child::desc]. This is only done if a is positional and b is not.

A union (or intersection or difference) of two path expressions is now rewritten to do the combination as late as possible: for example ( /a/b/c | /a/b/d ) is rewritten as ( /a/b/(c|d) ). Note, this is a first small step in the identification of common subexpressions. The cases where two subexpressions are detected as being identical are fairly limited, for example there is no knowledge of which operators are commutative or associative.

Internal Changes

The organization of the net.sf.saxon.functions package has changed. Much of the fixed information associated with individual functions is now contained in a static table in the StandardFunction.java module, rather than being returned by methods associated with each function. Most of the optimization methods (simplify, getDependencies, and reduce) now have a generic default implementation in the Function.java class, which most of the individual functions now use. This has reduced the overhead associated with implementing each function, which is important as the number of functions in XPath 2.0 has grown so much. It also creates further opportunities for combining the implementation of several related functions in one module, with better ability to share common code.

Unique document numbers are now allocated in the NamePool rather than the DocumentPool. This is visible in the results of generate-id(), because it means document numbers are not reset at the beginning of each transformation. This change has been made so that functions that rely on unique document numbers (for example, comparison of nodes into document order, or the union operation) can be done safely in a free-standing XPath environment. Eventually this will also allow document() to be executed outside an XSLT context - but not yet.

Integration and Extensions

I have tested Saxon with the Resin XML parser, but found it very buggy (version 2.1.1)

I have tested Saxon with the Piccolo XML parser (version 1.03), and found it worked very well except for a few stress tests, particularly in the area of namespace handling. I have reported four bugs.

The tables for converting XPath data types to Java types (when calling extension functions) have been revamped. The design has changed so that where there are two methods that appear to match the function call, one of them will generally be chosen even if the choice is arbitrary: this is because in many cases where Java classes define polymorphic methods, the results will be the same whichever method is chosen.

Changes in version 7.1 (2002-04-30)

Integration

This version of Saxon has been modified and tested to work with JDOM beta 0.8 and with FOP 0.20.3. In both cases, code changes were needed to work with these versions, and I have not tested whether the code still works with earlier versions; the chances are that it doesn't.

Error clearance

In general, bugs that have been cleared in Saxon 6.5.1 or Saxon 6.5.2 have also been cleared in this release. For details of the clearance of specific bugs, see the bug tracker at Sourceforge. Remember that closed bugs are not listed unless you ask for them.

Multiple output documents

The href attribute of xsl:result-document is now interpreted as a relative URI, relative to the system ID of the principal result document. This works only where the system ID of the principal output is known, and uses the "file://" protocol. The result document is no longer created relative to the current working directory, for security reasons (it causes problems when executing an untrusted stylesheet in a servlet environment).

Note that when Saxon is invoked from the command line, the -o option should be used to specify the principal output destination. This will ensure that a suitable system ID is available. If the result document is sent to the standard output stream (even if this is redirected to a file), Saxon will not know the system identifier and will therefore be unable to create a secondary output destination using a relative URI. It is still possible, of course, to specify an absolute URI as the value of the href attribute - note that this must be a URL, not a filename, so it will typically start with file://.

It is now possible to specify an OutputURIResolver to be used to resolve the URI specified in the href attribute of the xsl:result-document element. This will be used in place of the standard output URI resolver. The OutputURIResolver is called when writing of the output document starts, at which point it must return a JAXP Result object to act as the output destination. It is called again when writing of an output document is complete. You can nominate an OutputURIResolver by calling ((Controller)transformer).setOutputURIResolver(new UserOutputResolver()), or by calling factory.setAttribute("http://saxon.sf.net/feature/outputURIResolver", new UserOutputResolver()).

If the -t option is used, a message is written to the standard error output identifying the files written using using xsl:result-document.

It is now an error to use xsl:result-document when the current output destination is a temporary tree.

XSLT changes

The meaning of the ALLOW_EXTENSION_FUNCTIONS attribute in the TransformerFactory has been extended so that setting the value to false also disables extension elements and the creation of multiple output files. This is because all these operations carry similar risks when a servlet is allowed to execute untrusted stylesheets.

Added support for the separator attribute of <xsl:copy-of>.

The current() function may now be used in a pattern (specifically, within a predicate). Its value is the node being tested against the pattern. For example, match="*[*[name()=name(current())]" matches any element that contains another element with the same name.

A global variable or parameter may now be used in the match pattern of xsl:template, provided that it does not cause a circularity (that is, it must be possible to evaluate the variable without calling xsl:apply-templates)

A global variable or parameter may now be used in the match pattern or the use expression of xsl:key, provided that it does not cause a circularity (that is, it must be possible to evaluate the variable without using the key() function against the key being defined)

The key() function may now be used in the use or match attributes of xsl:key, provided the key definitions are not circular. (For example, key k1 can be defined in terms of key k2, provided that k2 is not defined in terms of k1.)

The group-ending-with attribute of xsl:for-each-group is implemented. It is especially useful where the last node in each group carries some kind of marker, for example continued="no".

Added attribute default="yes"|"no" to saxon:collation, to specify whether this collation should be used as the default collation. If more than one collation is specified as the default, the last one wins. If no default collation is specified, Unicode codepoint collation is used. The default collation is used by the compare() function if no third argument is supplied, by xsl:sort if no collation is specified (for data type text or string), and also by the comparison operators =, !=, <, >, etc.

The collation name is now a URI, not a QName.

Sorting and comparison according to Unicode codepoints can be achieved by setting up a collator as <saxon:collation name="unicode" class="net.sf.saxon.sort.CodepointCollator"/>

XPath changes

The implementation of the "and" and "or" operators has reverted to two-valued logic, since three-valued logic didn't make it into the published XPath 2.0 working draft. (Actually, it seems 3-valued logic wasn't working in Saxon 7.0 anyway).

Changed the "==" and "!==" operators to "is" and "isnot".

Changed string literals to allow the delimiting quote marks to be doubled. For example, <xsl:value-of select="'[He isn''t]'"/> displays the string [He isn't]

Changed the some and every expressions to allow multiple range variables, for example some $i in //I, $j in //J satisfies $i = $j

Implemented the singleton value-comparison operators (eq, ne, gt, lt, ge, le). These return an error if applied to a sequence containing more than one item, and return the empty sequence if either operand is an empty sequence; when applied to singletons, they return the same result as the XPath 1.0 operators (=, !=, etc).

Less-than and greater-than comparisons between nodes and/or strings now do a lexicographic comparison using the default collating sequence; at XPath 1.0 they did a numeric comparison. A warning is output in this situation (and one or two other situations, but not all) to advise of the backwards incompatibility.

The rules for deciding when path expressions need to be sorted have been revised. As a result many cases now require no sort where previously a sort was done. Examples of such expressions include a/b/c, .//a, $x[1]/a, //@a. In addition, most path expressions that return results in reverse document order are now sorted by a simple reversal, which is much faster than a full sort.

There's a temporary bug in that path expressions returning namespace nodes don't always return them in document order. I'm awaiting resolution of the XPath 2.0 data model rules before fixing this.

Suppress lazy evaluation of assignable variables. (This was designed to prevent a stack overflow, it didn't succeed, but it seems a good idea anyway).

Added the ability for a Source object to be supplied as the value of a stylesheet parameter or as the value returned by an extension function.

Added dateTime and date data types. Initially the only operations supported are the currentDateTime function, the dateTime and date constructors, and conversion between strings, dates, and dateTimes in both directions. Conversion to string uses the timezone of the current locale.

Implemented comparisons (equals, less-than, etc) between dates and dateTimes. Also implemented sorting. The data-type of xsl:sort may take the two values "text" or "number" (which are treated as synonyms of xs:string and xs:double) or any XML Schema built-in data type for which sorting is supported. The values in the sequence to be sorted are converted to this data type (using the same rules as for cast as) and the rules for this data type determine the sort order.

Note that (as required by the XML Schema specification) dateTime values are normalized to UTC. The original timezone specified when the dateTime was constructed is not retained. If no timezone is present, this fact is remembered. Such a dateTime is compared with other dateTimes as if it were a UTC dateTime.

Implemented the instance of operator (including the instance of only variant): for example if ($x instance of xs:integer *) then x else y. The types that are currently supported are the 19 primitive schema types (the namespace may be either of the two namespaces permitted in XML Schema Part 2), the derived type xs:integer, the node types document, element, attribute, text, comment, processing-instruction, or namespace, and the abstract types node, and item. (There is no syntax currently for the general numeric type or for the general atomic type). The type name may be followed by one of the qualifiers "*", "+", or "?" to indicate the number of occurrences; if there is no qualifier, there must be exactly one occurrence. The more sophisticated forms of type-checking, using schema-defined complex types, are not yet supported.

Implemented the cast as data-type expression, for example cast as xs:boolean($x). The conversion rules are the same as those which apply implicitly when a value is supplied in a context where a different type is expected.

Implemented the treat as data-type expression. This doesn't actually have much use in an XSLT context, where type conversion is performed implicitly when required, and the semantics of the expression are probably not correctly implemented at this stage: the specification is still evolving.

New XPath API

A new API has been introduced for executing XPath expressions. This is simpler and safer than the API provided in previous releases, which was essentially improvised from implementation classes rather than being designed top-down as an interface suitable for application use. The API is loosely modelled on the proposed DOM Level 3 API for XPath.

The new API uses the class net.sf.saxon.xpath.XPathEvaluator. This class provides a few simple configuration interfaces to set the source document, the static context, and the context node, plus a number of methods for evaluating XPath expressions. The static context can be omitted if the expression does not use namespaces, external variables, or extension functions. If the expression uses namespaces, an instance of StandaloneContext can be supplied, allowing the required namespaces to be declared either explicitly, or by reference to the in-scope namespaces of some Node.

There are two methods for direct evaluation, evaluate() which returns a List containing the result of the expression (which in general is a sequence), and evaluateSingle() which returns the first item in the result (this is appropriate where it is known that the result will be single-valued). The results are returned as NodeInfo objects in the case of nodes, or as objects of the most appropriate Java class in the case of atomic values: for example, Boolean, Double, or String in the case of the traditional XPath 1.0 data types.

It is also possible to prepare an XPath expression for subsequent execution, using the createExpression() method on the XPathEvaluator class. This is worthwhile where the same expression is to be executed repeatedly. The compiled expression is represented by an instance of the class net.sf.saxon.xpath.XPathExpression, and it can be executed repeatedly, with different context nodes. However, the compiled expression is bound to one particular source document (this is to ensure that the same NamePool is used).

The design principle of this API is to minimize the number of Saxon classes that need to be used. Apart from the NodeInfo interface, which is needed when manipulating Saxon trees, only the four classes XPathProcessor, XPathExpression, StandaloneContext, and XPathException are needed. For convenience, XPathException and StandaloneContext have been moved to the net.sf.saxon.xpath package.

If you want to use extension functions or variables you will need to create your own implementation of StaticContext. Although this interface has been greatly simplified, this is still not to be attempted lightly.

The old APIs for executing expressions still exist for the time being, but they are likely to be less stable.

Internal changes

Changed ContentEmitter to check in startElement() that qname and local-name are both supplied; this checks against parser configuration errors. This change could (should?) be retrofitted to the 6.5 branch. The change also uses a stack of namecodes so that endElement() doesn't need to look up the names in the name pool. In implementing this change, I discovered that Saxon depends on the XML parser passing the QName argument to the startElement() call, something which according to the SAX2 specification is optional. However, all known parsers supply this argument, and the code changes to cope with its absence would damage performance, so I have simply documented this as a dependency on the parser.

Implemented infrastructure for data type support:

A new class net.sf.saxon.value.Type centralizes the definition of node types and atomic types
The tokenizer now does single-token lookahead, needed to support double-keywords such as "cast as"
Added parsing support for "EXPR instance of [only] DATATYPE" and "cast as DATATYPE ( EXPR )"
Added isA method to SimpleValue so each value knows what types (primitive or derived) it belongs to

I have changed the implementation of temporary trees (result tree fragments). The FragmentValue class has disapeared. This delayed the construction of an actual tree until it the tree was actually used as a node-set: the effect was to optimize simple uses of temporary trees but at considerable cost to the more general usage which is now permitted in XSLT 2.0. Also, the introduction of tinytrees has reduced the value of this optimization. Therefore, a temporary tree is now constructed immediately as a real tree.

A side-effect of this change is that when disable-output-escaping is used while writing nodes to a tree, the instructions to switch escaping on and off are recorded in the tree in the form of the processing instructions defined by JAXP 1.1. Previously, these instructions were recorded in a form that kept the information through an xsl:copy-of instruction, but lost the information if the tree was processed in any other way. Note that the behavior of "sticky d-o-e" (that is, the effect of disabling output escaping when writing to a temporary tree) is currently an open issue in XSLT 2.0.

The indexes associated with keys are no longer referenced from each document instance, they are handled externally. This makes it easier to share the same index implementation across all the different document implementations. The indexes are now held by the KeyManager. It uses a WeakHashMap to ensure that when a document is removed from memory by the garbage collector, its indexes are removed too.

The mechanism for keeping stylesheet signatures in the namepool has been removed. It caused a creeping "memory leak" in continuously running services, and is not really needed. It was invented to allow namepools to be copied, but this facility has never been properly documented or tested. Instead, there is now a simple check that the source document and stylesheet are using the same namepool. (This change, or a simplified version of it, has also been made to 6.5.2).

The StaticContext interface has been greatly simplified, reducing duplication and making it easier to create a new implementation of this interface. This has been achieved partly by doing some work in the XPath ExpressionParser that was previously done in the StaticContext, and partly by changing those functions such as format-number() and sort() that only work in an XSLT context to check that the context is indeed XSLT before accessing the context information.

SQL extension

At the suggestion of Claudio Thomas [claudio.thomas@web.de], I have extended the sql:query instruction to allow the attribute disable-output-escaping="yes|no". This is useful where the database content being retrieved contains XML or HTML markup that is to be preserved in the output. Use this with care; it disables escaping for all the rows and columns retrieved, some of which may contain special characters such as "<" and "&" that do need to be escaped.

This change has not been tested.

Extension functions

Added extension functions: saxon:parse() and saxon:serialize(). These allow conversion of a string containing well-formed XML to a tree structure, and vice-versa.

Added extension functions: saxon:string-to-unicode() and saxon:unicode-to-string(). These allow conversion between a string and a sequence of integers representing the Unicode values of the characters in the string.

Added extension functions saxon:pause-tracing() and saxon:resume-tracing().

The return value from an extension function may now be an implementation of java.util.List, representing a sequence. The members of the List must all implement net.sf.saxon.om.Item

An argument to an extension function may now be the class net.sf.saxon.om.NodeInfo, or a subclass. If the supplied value is a sequence, the first node in the sequence is passed to the function; it is an error if there is no node in the supplied sequence, or if the node is of the wrong type.

The rules for calling extension functions with a sequence-valued argument have been clarified, and some new options are permitted, e.g. declaring the argument as java.util.List. The possibilities have not been extensively tested.

Implemented memo functions (thanks to Robert Brotherus for the suggestion). If you specify the attribute saxon:memo-function="yes" on xsl:function, Saxon will keep a cache that maps the supplied argument values to the result of the function, and if the function is called twice with the same arguments, the original result will be returned without re-evaluating the function. Don't use this option on a function that depends on the context, or on a function that creates a new temporary tree and is required to create a new instance each time. Also note that there are cases where it may be faster to re-evaluate the function than to do the lookup; this is especially true if the argument is a large node-set.

Changes in version 7.0 (2001-12-20)

This version introduces initial support of features defined in working drafts of XSLT 2.0 and XPath 2.0.

Version 7.0 should be regarded as an experimental alpha release. For production use, please continue to use Saxon 6.5

The Saxon package name has changed from com.icl.saxon to net.sf.saxon. Any applications that use Saxon java classes directly (rather than relying on the JAXP interface) will need to be modified. Note that this also affects the settings of the system properties javax.xml.parsers.SAXParserFactory and javax.xml.transform.TransformerFactory.

The entry point from the command line has changed from com.icl.saxon.StyleSheet to net.sf.saxon.Transform.

The namespace URI for saxon extensions has changed from http://icl.com/saxon to http://saxon.sf.net/. Note that many extensions have been withdrawn, as they are superseded by facilities in XPath 2.0 and/or XSLT 2.0.

To allow coexistence, the name of the JAR file for this release has changed to saxon7.jar. The SQL extensions are now in a separate JAR file, saxon7-sql.jar. A transformation can now be executed directly from the JAR file using the command java -jar saxon7.jar in place of java net.sf.saxon.Transform.

Saxon now requires JDK 1.2 or later to run. In consequence, Saxon will no longer work with the Microsoft Java VM, and the Instant Saxon version of the product is therefore no longer available.

Because Saxon no longer runs with the Java VM, it can now be run as an applet within Internet Explorer only if the Sun Java plug-in is installed. You can get this from http://java.sun.com/getjava. This may require some configuration changes because of the differences in security policy.

The following sections summarize the main new features. These assume familiarity with the XPath 2.0 and XSLT 2.0 specifications; however, summaries of the new syntax for expressions and XSLT elements are included in this package.

XPath 2.0 Data Model and Language

Sequences of nodes or simple values are supported, including the sequence constructor expression ($a, $b, $c). Path expressions now return a sequence of nodes containing no duplicates, in document order.
The new if expressions, for expressions, some and every expressions are supported.
New operators are implemented: except, intersect and union for combining sequences (implemented only for sequences of nodes at this release); also ==, !== for comparing node identity, and << and >> for comparing relative position of nodes in document order. These return () when an operand is (), and fail when an operand contains more than one node.
Any expression may now appear on the right-hand side of the "/" operator (a run-time error is reported if it doesn't evaluate to a sequence of nodes). Examples: a/(b|c)/d, or document('x')/key('a','b')
Added range expressions. For example 1 to 10 evaluates to the sequence ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 )
The new numeric data types integer, decimal, and float are implemented. XPath literals, and string-to-number conversion, now allow "e" in a numeric literal. A literal containing an "e" is interpreted as a double, any other literal containing a "." is interpreted as a decimal, and any other numeric literal as an integer. Integers are limited to 64-bit quantities, but decimals are of unlimited size. Arithmetic is now done using the XPath 2.0 type promotion rules, except that "div" and "mod" always convert both arguments to double. For the time being, I have retained the XPath 1.0 rule that the string representation of a number never ends in ".0"; I found that changing it to use the canonical representation defined in XML Schema (as required by XPath 2.0) meant rewriting too many of my tests. However, negative zero now converts to the string "-0" (it was "0" at XPath 1.0).
The code for equals, not-equals, etc, has been generalized to work on any sequence in the same way as for a node-set. This retains the "existential" semantics of XPath 1.0. It still does a comparison of the string-value of each item, not the typed value.
Filter expressions now work on any sequence (including a singleton sequence, so 2[1] is allowed)
3-valued logic has been implemented for AND and OR. (This doesn't reflect the XPath 2.0 working draft as published: a late change was made).
The syntax *:localname (like prefix:*) is allowed in path expressions, and also in patterns and in xsl:strip-space and xsl:preserve-space. It matches any node with the given local name, regardless of namespace.
The Context object has been changed so that there is a context item rather than a context node. The context item is either a simple value or a node. The expression "." returns the value of the context item. If the context item is not a node, then relative path expression (including self::node()) return an empty node-set. Absolute path expressions (starting with "/"), and key() and id(), now apply to the document containing the current node, as distinct from the context node. Note: the implementation of context document doesn't accurately reflect the concept of the context document in the XPath 2.0 working draft as published.
The rules for string-to-boolean conversion have changed so that "", "0" and "false" now return false, all other strings return true.

XSLT 2.0 features

The xsl:value-of element has a new separator attribute, so it can be used to output a sequence.
The xsl:for-each element supports arbitrary sequences.
The extension elements saxon:group and saxon:item are withdrawn.
The new xsl:for-each-group instruction, and the associated current-group() function, are implemented.
The xsl:function and xsl:result elements are implemented; these replace saxon:function and exslt:function. Note that the XSLT 2.0 specification is more restrictive as to what can appear in a function body: it has to be zero or more xsl:param elements, followed by zero or more xsl:variable elements, followed by an xsl:result element. However, this is not a serious restriction in practice, because most computations can now be carried out within a single XPath expression.
The new xsl:namespace instruction is implemented (it writes a namespace node to the result tree)

xsl:copy-of

separator

The xsl:document element (and its synonym saxon:output) are replaced by xsl:result-document. This no longer includes the serialization attributes directly, instead it refers by name to an xsl:output declaration, or can use the unnamed xsl:output declaration by default.
The xsl:output element now supports method="xhtml", replacing method="saxon:xhtml". The precise details of the output may not be fully conformant with the specification.
The xsl:destination element is provided, however, since the href attribute is currently ignored, it is not very useful at this stage.
The saxon:handler element is no longer supported.
The xsl:script element is no longer supported - however, the synonym saxon:script remains available
A collation attribute has been added to xsl:sort, and the implementation of sorting now uses JDK 1.2 collators. The collation attribute must match the name attribute of a saxon:collation element. If none is specified, the lang attribute is now used to select a collator, or if the lang attribute is omitted, a collator is obtained for the default locale.
Named sort keys are available, via the xsl:sort-key element. A named sort key may be used to perform a sort from within an XPath expression, using the new XSLT-defined sort() function.

Function Library

I have made the following changes to the function library:

The saxon:distinct() extension function now works on any sequence.
The count() and sum() functions now ork on any sequence, and new functions avg(), min(), and max() are provided.
Added ends-with()
Added upper-case() and lower-case(). These use the rules defined by the Java default locale
Added properties "product-name" and "product-version" to system-property()
Removed the saxon:range() extension function (it can now be done using the syntax "a to b")
Changed saxon:tokenize() to return a sequence of strings instead of a node-set
Changed key() so that the second argument can be any sequence; each member of the sequence is converted to a string and treated as a potential key value
Changed document() so that the first argument can be any sequence; each member of the sequence can be a URI of a document to be loaded.
Removed the saxon:node-set() extension function, which is now obsolete.
Removed the saxon:if() extension function, which is superseded by XPath 2.0 conditional expressions.
The saxon:closure() function is temporarily withdrawn, because it relies on non-standard use of the current() function.
The node-set() function in the EXSLT common module is now a no-op; the object-type() function returns one of "sequence", "boolean", "number", "string", or "external".
Changed highest() and lowest() in the EXSLT math module to work on arbitrary sequences.
Added exists() and empty(), insert() and remove(), index-of() and sublist().
Added not3() (three-valued not() function)
Added string-pad() function
Removed saxon:exists() and saxon:for-all(): these are superseded by the some and every constructs in XPath 2.0
Added the compare() function: the third argument (collation) is initially mandatory, and must be a QName matching a saxon:collation element
Added base-uri() function replacing the undocumented saxon:base-uri() extension function
Added constructor functions as described in the XPath Functions and Operators specification: some of them don't do much yet, but float() is the only way of creating a single-precision floating point number.

Significant Omissions

In general, features of XSLT 2.0 and XPath 2.0 not listed above have not been implemented. In particular, these include:

Backwards compatibility mode in XSLT
The type attribute of xsl:variable, etc.
Uniform handling of the empty sequence by functions and operators in XPath expressions
XPath constructs related to types, for example instance of and cast as.

Internal changes

As might be expected, the Saxon code has undergone major change internally, which will affect any application making significant use of internal interfaces. Here are some of the highlights:

The NodeEnumeration class is replaced with SequenceIterator. This is modelled on the JDK 1.2 Iterator interface, and can return any sequence of Items. An Item is either a SimpleValue or a NodeInfo.
Added the getAnother() method to the SequenceIterator interface. This means all SequenceIterators can clone themselves to produce another SequenceIterator of the same nodes. (But it's not a pure Java clone, because the new iterator is positioned at the start of the sequence.) The implementation of last() has changed: if the SequenceIterator doesn't know how many nodes there are in the sequence, the iterator is cloned and the items are counted, without being saved in memory as previously. This should only happen once for any given iterator, so in general calling last() causes a node-set to be scanned twice.
The Step class has been removed. A step that is a simple unfiltered axis expression is now represented by an AxisExpression; a step with filters is represented by a FilterExpression, and any other kind of expression may also now be used in a PathExpression.
An important new internal class is the MappingIterator. This maps one sequence to another sequence by invoking a MapppingFunction on each member of the first sequence. This capability is now used to implement both path expressions and for expressions. It is also used in various other contexts, e.g. in the implementation of the document(), key() and id() functions.
The asString method on Value is renamed getStringValue, this allows both SimpleValues and Nodes to implement the new Item interface, which represents a member of a sequence.
Internal changes in support of datatypes: I removed evaluateAsDouble(), etc, and replaced them with a generic method evaluate(context, requiredType), which always returns a Value. A new package, net.sf.saxon.value, now contains all the data-type related classes.
The functions key(), id(), and document() are now fully pipelined, that is, they deliver an iterator over the result nodes.
The classes that handle sorting have been totally rewritten, partly to handle general sequences, and partly to use collations.
The Context class has changed. There are now two separate classes, XSLTContext and XPathContext. There is only one XSLTContext object used during a transformation (the information could have been held in the Controller itself). Stacking of values in the XSLT context uses the Java program execution stack, in the sense that any routine that sets a new value has to remember the old value and reset it on completion. The XPath context, by contrast, still creates a new instance every time a new value is stacked. This essentially just wraps the SequenceIterator that represents the context node list. The SequenceIterator itself is responsible for returning current item and current position.

I have removed documentation of the saxon:trace extension attribute; it seems this hasn't been working for some time.

API changes

The Context class no longer implements the XSLT 1.1 WD interface org.w3c.xsl.XSLTContext.
Dropped the classes in com.icl.saxon.handlers (ElementHandler etc). It is still possible for a Java application to register a NodeHandler to receive events; this must now be written as an implementation of the net.sf.saxon.NodeHandler interface. See the ShowBooks.java sample application to see how.
It is no longer possible to specify a user-defined collation using the data-type or lang attribute of xsl:sort; instead, it must be specified using the collation attribute, with a saxon:collation element that maps the named collation to a Java class that implements the JDK java.util.Comparator interface.

SQL extension elements

A new sql:query instruction has been added, to accompany the existing sql:connect, sql:insert, etc.

Attributes:

table	The table to be queried (the contents of the FROM clause of the select statement). This is mandatory, the value is an attribute value template.
column	The columns to be retrieved (the contents of the SELECT clause of the select statement). May be "*" to retrieve all columns. This is mandatory, the value is an attribute value template.
where	The conditions to be applied (the contents of the WHERE clause of the select statement). This is optional, if present the value is an attribute value template.
row-tag	The element name to be used to contain each row. Must be a simple name (no colon allowed). Default is "row".
column-tag	The element name to be used to contain each column. Must be a simple name (no colon allowed). Default is "col".

The xsl:query instruction writes zero or more row elements to the current result tree, each containing zero or more column elements, which contain the data values.

Thanks to Claudio Thomas [claudio.thomas@web.de] who supplied the original version of this code.

The SQL extensions are now contained in a separate JAR file, saxon7-sql.jar, which must be on the class path if these extensions are used.

Michael H. Kay
15 February 2003