SAXON home page

Changes in this Release

This file describes changes for versions 7.0 and later. For changes prior to version 7.0, see http://saxon.sf.net/saxon6.5.2/changes.html.

Changes in version 7.8 (2003-11-12)

Defects cleared

818677 There is no type-checking performed on parameters passed by xsl:call-template or xsl:apply-templates

822257 No error is reported if the stylesheet contains two named templates with the same name and the same import precedence.

826540 Null pointer exception caused by incorrect optimization (moving a variable reference out of a branch of a conditional expression)

827369 An exception (IndexOutOfBounds) occurs if the body of an XSLT stylesheet function is empty.

830841 If there are character maps defined in the stylesheet, then no escaping of special characters in attribute values takes place.

832548 Incorrect optimization occurs for the variable declared as at $x in an XQuery FLWOR expression. The variable typically retains a value of zero.

XSLT Changes

Implemented the new xsl:perform-sort instruction. The select attribute is currently mandatory, it is not possible to use it with a contained sequence constructor {sortNNN}

Internally, the code for handling sorting has been unified between xsl:for-each, xsl:apply-templates, and xsl:perform-sort.

The xsl:sort-key declaration and the sort() function are withdrawn.

The instructions xsl:attribute, xsl:comment, xsl:processing-instruction, and xsl:namespace now allow a select attribute; if this attribute is present the content of the instruction must be empty. The value of the select expression may be a sequence; all of the items in the sequence are included in the result, by converting them to strings and separating them with a single space. {arts50, posn80, node51-54}

An xsl:analyze-string element is now required to have at least one xsl:matching-substring or xsl:non-matching-substring child.

Tunnel parameters have been implemented. {var20-22, 905err, 906err}

The xsl:with-param element now accepts an as attribute.

Patterns of the form *:local-name are now accepted {match51}

No code was being generated to perform run-time type checking or conversion of template parameters (bug 818677). This has been corrected. {var16, var904err}

It is now a compile-time error if a parameter is supplied in xsl:call-template that doesn't match any parameter declared in the template being called, or if it has the wrong type when compared with the declared type of the parameter, or if the called template has a required parameter that isn't supplied in the call.

On the command line, it is now possible to specify a parameter in the form +param=filename. The filename will be parsed as an XML document, and the document node will be passed to the stylesheet as the value of the stylesheet parameter param. If the filename is a directory, then all the files contained immediately within the directory will be parsed and the result will be passed as a sequence of document nodes.

Parameters supplied on the command line are now treated as untyped atomic values rather than strings, which means they can be supplied where the expected type is (say) integer or date; the string supplied as the value of the parameter will automatically be converted to the required type.

Tracing (with the -T option) has been improved. Some instructions such as xsl:analyze-string were not being traced. The trace output now includes the names and values of variables (the value is truncated to the first four items in a sequence, and the first 20 characters of each item; nodes are shown by their generate-id() value). More information is available to user-written trace listeners, in particular, the Controller is now available (as a property of the InstructionInfo object). The InstructionInfo interface now has a general-purpose getProperty() method, allowing additional information to be made available without changing the interface for existing TraceListeners.

The rule is now enforced that the namespace URI of a function name, variable name, mode name (etc.) cannot be a reserved namespace URI (such as the XML namespace, the XSLT namespace, or the XML Schema namespace).

In xsl:number, the format tokens one, first, and 1st are no longer available; they have been replaced by the format tokens W, w, and Ww (for upper case, lower case, and title case words), together with the optional attribute ordinal="yes". These sequences are currently implemented for English and partially (see below) for German. {numb14, 24, 25}

The functions format-date(), format-time(), and format-dateTime() have been updated to match the latest specs. Specifically, they now take either two or five arguments (though Saxon currently ignores the last two); names of months or days of the week are requested using presentation modifiers N (upper-case), n (lower-case) and Nn (title case), and all other modifiers are interpreted in the same way as xsl:number. {date067, date068, date073}

I have extended the support for localizing xsl:number and format-date in German. It's mainly a proof of concept, to show that it's possible. The code is in module net.sf.saxon.number.Numberer_de, and similar modules can be written for other languages: just change the last two letters of the class name to the language code used. If you do write implementations for other languages, I will be happy to include them in future Saxon distributions. {numb28, numb29}

The attribute disable-output-escaping is no longer supported on xsl:attribute. In theory, you should be able to use character maps instead.

Casting a string to an xs:QName is now supported: but only in XSLT (not in XPath or XQuery), and only when an explicit cast or constructor function is invoked (not, for example, when passing an untyped atomic value to a function that expects an xs:QName).

Literal result elements now compile internally into xsl:element and xsl:attribute instructions. This results in changes to trace output: each attribute is now traced as a separate instruction.

XQuery

Importing library modules is now supported, using the import module syntax. At present, all the modules in a query are compiled at the same time. The location of each module (the at clause in import module) must be included the first time a particular module is imported, but it may be omitted on subsequent occasions (modules are processed recursively, depth first). If a module for a particular namespace is already loaded, then the at clause is ignored. Optionally, applications can precompile library modules and register them in the Configuration object, and they will then be found when another module attempts to import them by namespace alone.

The query parser now attempts recovery after a syntax error, resuming parsing at the next semicolon.

The predeclared namespace prefix local is available for use when defining local user-defined functions.

Creating two attributes with the same name for the same element is now reported as an error. Previously (as in XSLT) the second simply overwrote the first.

Namespace declarations in direct element constructors are now correctly scoped; namespace prefixes in element and attribute names in a direct element constructor are correctly validated. (This is a bug fix). The rules concerning the distinction between active and passive namespaces are properly applied (active namespaces are copied to the result, passive namespaces are not).

The order by options empty least and empty greatest have been implemented. At the same time, code has been added to check that the sort key is not a sequence of length greater than one.

It is now possible to specify a parameter on the command line in the form +param=filename. The filename will be parsed as an XML document, and the document node will be passed to the stylesheet as the value of the external variable param. If the filename is a directory, then all the files contained immediately within the directory will be parsed and the result will be passed as a sequence of document nodes.

A query can now be included directly in the command line rather than reading it from a file. It is written enclosed in curly braces. For example java net.sf.saxon.Query {doc('a.xml')//p[1]} selects elements within the file a.xml in the current directory.

Any filename passed using the -s option is no longer accessible via the input() function, but is still accessible as the context node.

XPath

The namespaces for the fn: and xdt: prefixes, and the URI for the Unicode codepoint collation, have been updated to match the latest specs (replace 2003/05 with 2003/11).

The new rules for double-to-string conversion are implemented. Values outside the range 1e-6 to 1e+6 are expressed in exponential notation, values within this range are formatted as decimals or integers. {expr111, 112}

In the XPath free-standing API, it is now possible to use the fn: namespace when calling functions in the core library. (It is also still possible to use these functions without any namespace).

A rewrite optimization has been added for expressions of the form E = M to N when E, M, and N are known statically to be integers. The test reduces to E ge M and E le N, which avoids comparing E with every integer in the range. When the test is of the form SEQ[position() = M to N] and M and N are constant integers, an even more powerful optimization is used that stops the iteration over SEQ as soon as item N is reached. {opt015, opt016}

The "minimax" optimization has been reinstated for comparing numeric sequences. This translates an expression of the form N1 < N2 into the form min(N1) < max(N2). It was used prior to version 7.4, at which point it was found to be unsafe if both sequences contain untyped atomic values. It is now used only if at least one of the sequences is statically known to contain numeric values only.

Functions and Operators

The data types gYear, gYearMonth, gMonth, gMonthDay, and gDay are implemented. {date076-079}

The functions get-current-date/time/dateTime() now return a date/time in the implicit timezone (as determined from the current Java Locale). {date001}

The function implicit-timezone() has been implemented. {date069}

The functions get-timezone-from-date/time/dateTime() now return the timezone as a value of type xdt:dayTimeDuration, not as a string. This means the result is displayed as (for example) PT3H rather than +03:00. {date048-052}

Casting from xs:dateTime to xs:date now retains the timezone, if there is one.

The functions adjust-date-to-timezone(), adjust-dateTime-to-timezone(), and adjust-time-to-timezone() are implemented.

Note that the function get-hours-from-date/dateTime() returns the localized component value. It has always done this in the Saxon implementation, and the latest working draft now makes this the correct behavior.

The function sum() now returns an integer 0 rather than a double 0.0 when the input sequence is empty. (An integer can be used anywhere a double can be used, but the converse is not true). Note: this change was agreed by the Working Groups, but was inadvertently left out of the published drafts.

The function root() now accepts an empty sequence as its argument (and returns an empty sequence).

The function distinct-values() now returns only one NaN if there are multiple NaN values in the sequence. {group026}

The function escape-uri() no longer escapes square brackets when the escape-reserved argument is false.

The function function-available() now recognizes function names that use the fn: namespace. {func32}

The functions contains, ends-with, starts-with, substring-after, and substring-before now use the Unicode codepoint collation if no explicit collation is supplied; they do not use the default collation.

The functions min and max now convert untyped atomic values to double rather than string, and return NaN if the sequence contains a NaN value. {expr57}

The function get-in-scope-namespaces is renamed get-in-scope-prefixes. {nspc45}

The function insert-before now allows the insert position to be beyond the end of the sequence (which causes the new sequence to be appended to the original). {expr71}

The function reverse has been implemented. {expr91}

In a range expression, $x to $y, it is no longer possible to produce a descending sequence of integers. Instead, if the start point is less than the end point, an empty sequence is returned. This change is to allow the construct for $i in 1 to count($seq) ... to work as expected. To get a reverse sequence of integers, use reverse(1 to 5).

The function call remove($seq, 1) is now treated specially, it is optimized in the same way as $seq[position()!=1] due to the common use of this expression in head-tail recursion. {expr92}

The functions context-item, distinct-nodes, input, item-at, node-kind, sequence-node-identical, and string-pad have been dropped.

The function concat must now have at least two arguments. (This reverts to the XPath 1.0 specification.)

In regular expression matching (e.g. in replace() and in the XSLT regex-group() function, group 0 now refers to the entire substring that matched the regex. {regex08}

The isnot operator has been dropped.

Extensions

A new top-level declaration is introduced: saxon:import-xquery allows the functions defined in an XQuery library module to be called from XPath expressions in the stylesheet. See extensions.html for details.

The extension functions saxon:before, saxon:get-user-data, and saxon:set-user-data, which have not been documented for some while, have finally been removed.

The rules in the spec on extension attributes have been clarified, in a way that makes it clear that the saxon:allow-avt attribute on xsl:call-template is not conformant. This attribute has therefore been removed, and dynamic calls on templates are enabled instead using a new extension element, saxon:call-template. In changing your stylesheet to use the new instruction, remember to set extension-element-prefixes="saxon". {saxon21}

The sequence type as="java:java.lang.Object" can now be used to refer to the type of a wrapped Java object returned by an extension function. The namespace prefix "java" binds to the namespace URI http://saxon.sf.net/java-type. (My thinking is that eventually it will be possible to use any Java class name to define the actual Java class of the external object. At present, however, only java.lang.Object is recognized.) {saxon51}

I have changed the design of the SQL extension so that the database connection is now stored explicitly in a variable, and the value of this variable is supplied on instructions such as sql:insert and sql:query. See the books-sql.xsl sample application to see how this works. {saxon51}

When calling Java methods, any XPath value can now be passed to a method that expects a DOM NodeList; a run-time ClassCastException occurs if the value contains an item that is not a node, or a node that is not represented by a DOM org.w3c.xml.Node (e.g. if it is a JDOM node). Not tested.

Data Model Changes

In general these changes affect both XSLT and XQuery.

Namespace Nodes

It is no longer always true that namespaces present on a parent element are automatically inherited by the children. XML Namespaces 1.1 allows namespaces to be undeclared using a construct such as xmlns:x="", and this capability is reflected in the model. Namespace undeclarations will be output by the serializer if the serialization property undeclare-namespaces="yes" is set. In XSLT, this can be defined on xsl:output. In XQuery, it can be controlled from the API (using the constant SaxonOutputKeys.UNDECLARE_NAMESPACES or the command line option !{http://saxon.sf.net}undeclare-namespaces=yes). If you don't set this option, you probably won't notice any difference. However, the change does mean that you sometimes get an unexpected xmlns="" undeclaration.

The rules for both XSLT and XQuery say that namespace nodes are never inherited when you add an element to a new parent element. Saxon isn't quite implementing this yet. Given the way Saxon represents namespaces internally, namespaces get inherited unless the code goes out of its way to prevent it, and at the moment this is only happening when an element is explicitly copied using xsl:copy or the equivalent in XQuery.

JDOM Support

Support for the namespace axis has been reinstated in JDOM. This underpins functions such as get-in-scope-prefixes, and ensures that namespaces are properly copied by xsl:copy-of. {axes-jdom049, 055, 129 etc}

For ease of testing, a new command line interface net.sf.saxon.jdom.JDOMTransform has been added. The arguments are exactly the same as the normal net.sf.saxon.Transform command.

The constructor for class NodeWrapper has been made protected. A new wrap() method has been supplied on the DocumentWrapper class, allowing any node in the document to be wrapped, provided that the document node has been wrapped.

Whitespace stripping should now work for JDOM input in the same way as for other tree models (see below).

DOM interface

A set of DOM wrapper classes have been written, analogous to the JDOM wrapper classes. {axes-dom[001-nnn]}

The DOM wrapper has been tested with the Crimson DOM provided in JDK 1.4 and with Xerces 2.5.0. Different DOM implementations are known to vary widely. Saxon's DOM interface does not attempt to deal with entity reference nodes, which appears to be OK with the default configuration of these two parsers. CDATA sections are treated as text nodes, no attempt is made to merge them with adjacent text nodes.

The DOM interface is very inefficient, for example it has to resolve namespace prefixes by searching the namespace declarations every time a node is referenced. Don't use it if performance matters to you.

I have changed the code for writing output to a DOMResult so that it now uses DOM level 2 interfaces Document#createElementNS() and Element#setAttributeNS(). As a result, it should now be possible to use methods such as getPrefix and getNamespaceURI on these nodes.

Whitespace Stripping

If you supply any kind of pre-built tree as input to the transformation (that is, if the Source object is a DOMSource or a NodeInfo), then Saxon no longer strips down the tree and rebuilds it to implement whitespace stripping. Instead, if whitespace stripping has been requested, it wraps the supplied tree in a whitespace-stripping envelope, which hides whitespace text nodes that the stylesheet has asked to be stripped, on the fly. Because this is done in a separate layer above the data model, it works for all data model implementations (JDOM, DOM, and native Saxon), and it imposes no overhead when it is not needed - that is, when the stylesheet doesn't request whitespace stripping, and when nodes are stripped during tree construction, which is still done if you supply a SAXSource or StreamSource as the input). {axes-dom154, 155}

Note that this whitespace stripping layer strips whitespace as requested by the stylesheet, regardless of whether any xml:space attribute is present in the tree to override this.

In the XQuery interface, whitespace stripping can be requested from the command line or from the API, but not from the query itself.

Whitespace stripping applies equally to documents loaded using the document() or doc() function as to the initial source document; but it is not applied to documents supplied as stylesheet (or query) parameters.

This model now allows the XQuery and XPath APIs to operate directly on a DOM or JDOM structure, with the results of the expression being references to the actual DOM or JDOM nodes, not to copies.

Internal APIs

The getLocalName method of the NodeInfo has been renamed getLocalPart to avoid conflict with the DOM method of the same name. The getLocalPart method returns "" rather than null for a node with no name, and it returns a value whether or not the node is in a namespace.

Changes in version 7.7 (2003-10-01)

Defects Cleared

761894: distinct-nodes() crashes when used in an XQuery user-defined function.

770785: Tail-call optimization bug in xsl:apply-templates when the select expression has context dependencies.

783382: Saxon infers the type document-node() rather than string for an empty variable.

788731: Infinite recursion in compiler while optimizing current()

788748: Incorrect static type inference for (integer div integer).

799095: Null Pointer Exception while compiling an XQuery using a variable reference with a direct attribute constructor.

805148: An attribute or namespace node may be deemed identical to an element node if they have the same offset in their respective arrays.

805149: The result of generate-id() for an attribute or namespace node may contain non-alphanumeric characters.

810626: The xsl:number instruction crashes when given a format picture containing a single punctuation token, for example format="*".

810644: The operands of the "is" operator are incorrectly compiled.

811914: The unordered() function fails if the argument is a sequence whose items are all compile-time constants.

XSLT changes

The xsl:character-map and xsl:output-character elements have been implemented. See further details. {charmapNNN}

The rules attribute on saxon:collation is now implemented, allowing a fully-customized collation to be created using the syntax for the java RuleBasedCollator. {sort25}

I have added a compile-time warning message if a variable declaration has no following sibling instructions. This is permitted, but has no useful effect and probably means the user has made a mistake.

The xsl:number element now takes a select attribute to select the node to be numbered. (This anticipates a change in the next working draft). {numb23}

The xsl:sequence instruction can now have either a select attribute or child instructions, but not both. (This anticipates a change in the next working draft).

Stylesheet attributes whose value is a name, or a number, or an enumeration such as yes|no, now allow leading and trailing spaces in the value. This feature has not been tested very thoroughly.

When running with version="2.0", the xsl:value-of instruction now defaults the separator attribute to a single space. (With version="1.0", in the absence of a separator attribute, it continues to discard all but the first item in a sequence.) {seq019, seq020}

The constructs element(), element(*, T), attribute(), and attribute(*, T) are now allowed as NodeTests within a pattern. If a type is specified, the default priority is 0. If no type is specified, it is -0.5. {schema019}

The construct [xsl:]exclude-result-prefixes="#all" is now implemented. {nspc48}

Where a namespace is on the list of excluded namespaces for a literal result element, but is used in the name of the element or one of its attributes, Saxon was ignoring the request to exclude the namespace. The effect was that if more than one prefix was assigned to the namespace, unnecessary namespace declarations were output. I have changed the code so it no longer ignores the request to exclude the namespace; instead, the namespace declaration that is actually needed will be reinstated by the namespace fixup process. {nspc49}

Enhancements have been made to the number formatting in format-date(), format-dateTime(), and format-time(). The format modifier "o" can be used to request ordinal numbers (1st, 2nd, 3rd). The format modifier "a" (or "A" for uppercase) gives numbering in words (one, two, three). They can also be combined ("ao" or "Ao") to give ordinal words (first, second, third). These formats currently always produce output in English. {date068}

Ordinal numbering is also available in xsl:number. (This was implemented to underpin date formatting, but is available generally.) Use the format token 1st for the sequence 1st, 2nd, 3rd, 4th..., the format token first or FIRST for the sequence first, second, third, fourth... (in either upper or lower case). The format tokens one and ONE have always been available, though not well documented. The full list of supported formats is documented in xsl-elements.html. Note that these are available only for English; sequences for other languages can be implemented by writing a user-defined Numberer, as described in Implementing a numbering sequence. {numb24, numb25}

Sorting a set of numeric values, without specifying data-type="number", now handles NaNs correctly.

XPath changes

The result of dividing two integers using the div operator is now a decimal, rather than a double. A consequence if this is that dividing by zero gives a run-time error, rather than Infinity. It may also affect the precision of results, but the effect is likely to be minor in most practical cases. (Actually, the computation is currently carried out with double precision, and then converted to a decimal, so the results are not as accurate as they could be.) This change exposed a bug in the handling of the mod operator for decimal arguments, which has been fixed.

A consequence of this change is that the avg() function when applied to a sequence of integers now returns an xs:decimal.

Numeric promotion is implemented in the type-checking rules: that is, an integer or decimal value can be supplied where a float or double is expected, and a float can be supplied where a double is expected. (The fact that this was not implemented before was something I had overlooked.)

The expression / (or a path expression starting with /) now returns an error if the context item is in a tree whose root is not a document node. The root() function, however, succeeds in this case. {seq018, seq908err, seq909err}

The constructs element(), element(*, T), attribute(), and attribute(*, T) are now allowed as NodeTests within a path expression. {type042, schema016-018}

In the processing-instruction() NodeTest, the quotes around the processing instruction name are now optional. The name must now be a valid NCName, whether or not it is enclosed in quotes. {copy63, node03}

XQuery changes

The changes affecting XPath also affect XQuery.

Improved documentation for the XQuery API is available on the Using XQuery page, and also in the JavaDoc for the relevant classes.

I have implemented the changes to the Query Prolog syntax introduced in the August Working Draft. These include the addition of semicolons as separators between declarations in the prolog; replacement of the keyword "define" by "declare" throughout; removal of the "=" symbol except in "declare namespace"; and relaxing the rules on ordering of declarations.

The declare base-uri declaration in the prolog is now supported.

The default element namespace now works as specified. This is somewhat different from the XSLT specification: in XQuery, the default element namespace affects unprefixed element names whether they appear in element constructors or in path expressions. It isn't possible (as in XSLT 2.0) to specify different defaults for names in the input document and names in the output document. {ns/addq2}

The typeswitch expression is now implemented. {xmp/addq10}

The collation option of the order by clause in a FLWOR expression is now implemented. See Collation URIs for details of the URIs that can be specified. {r/addq1}

Sorting a set of numeric values now handles NaNs correctly (NaNs are considered equal to each other and less than any other value).

Computed comment constructors and processing-instruction constructors are supported. {xmp/addq4}

Entity references and character references are supported in string literals (previously they were supported only in element and attribute content).

Serialization

It should now be possible to use any output encoding that is supported by the Java VM, without defining a custom CharacterSet class. In JDK 1.4, Java allows the application to determine whether particular characters are encodable using a given character set, and this information is now used to decide whether to replace the character with a numeric character reference. Because I don't know how efficient this mechanism is, I still use the old mechanism for character sets that were previously supported in Saxon, and the mechanism for defining user-defined character sets is still available for the time being. It has been restricted, however, so that Saxon will only attempt to load a PluggableCharacterSet for encoding XXX if the output property encoding.XXX="class-name" is present.

The code now allows for the possibility that character encodings other than UTF-8 and UTF-16 may be capable of encoding supplemental characters (characters whose Unicode codepoints are above 65535). Previously such characters were always output as numeric character references, except when using UTF-8 and UTF-16. A consequence of this is that user-written PluggableCharacterSet implementations must be prepared to categorize such characters.

There are some differences between the character encodings supported by the old java.io package and the new java.nio package. If the requested encoding is not supported by the java.nio package, then all non-ASCII characters will be represented using numeric character references. If the encoding is not supported by the java.io package, then Saxon will revert to using UTF-8 as the actual output encoding. A list of the character encodings supported in the java.nio package can be obtained by using the command java net.sf.saxon.charcode.CharacterSetFactory, with no parameters.

The HTML serialization method should now handle INS and DEL elements correctly.

User-written emitters were not working; the code has been fixed but not tested.

Functions, Operators, and Data Types

Collations can now be specified directly using a URI, without requiring a saxon:collation declaration. This makes them available in XQuery and XPath applications as well as XSLT. The URI takes a form such as http://saxon.sf.net/collation?lang=de;strength=primary and is specified fully in Collation URIs. {r/addq1, sort26}

The collection function is implemented. The Saxon implementation interprets the URI of the collection as a reference to an XML document that acts as a catalogue listing the documents in the collection. An example of a catalogue document is:

<collection>
  <doc href="doc1.xml"/>
  <doc href="doc2.xml"/>
  <doc href="doc3.xml"/>
</collection>

In effect, collection("a.xml") is merely a shorthand for document(document("a.xml")/collection/doc/@href). My thinking is to extend the catalogue structure in future to allow options to be specified for how errors are handled, how the documents are parsed (e.g. validation, space stripping), and whether the documents should be locked in memory. {mdocs19}

The tokenize() function now supports the facility to split a string into its invididual characters if the regex matches a zero-length string. For example, tokenize('alphabet', '') returns the sequence ('a', 'l', 'p', 'h', 'a', 'b', 'e', 't'). Note: there has been some discussion on this topic in the public-qt-comments list, and the specification could change as a result. {regex19}

In XPath expressions in XSLT stylesheets, core functions can now appear in the fn: namespace (currently http://www.w3.org/2003/05/xpath-functions). Of course, they can also be unprefixed. {coreFunction101}

The result of dividing two integers is now a decimal. {math-two17}

Values of type xs:language are now properly validated. {type008}

JAXP changes

An identity transformation is now able to extract a subtree of a DOMSource starting at any element. This clears a long-standing bug 548228. A new test exampleDOMsubtree has been added to TraxExamples to demonstrate the capability.

Internal changes

The various bit-valued static properties of an XPath expression (dependencies, cardinality, and other special properties) have now been brought together into a single word, whose value is computed once and stored on each node in the expression tree rather than being calculated on demand. (There were some cases where this calculation was still being done at run-time).

Some changes have been made to the design of tail-call optimization. This is mainly to fix a bug arising when apply-templates uses a select expression with context dependencies. The decision that a call is a tail call is now made statically rather than dynamically, to avoid the costs of creating a closure for the select expression when this is not needed.

In some cases XSLT stylesheet functions are now compiled to the UserFunction object originally introduced to support XQuery. This is done where the body of the function is sufficiently simple: this basically means that it must consist of a sequence of xsl:param elements, then xsl:variable elements, and finally an xsl:sequence element with a select attribute to define the result of the function. The effect is that recursive calls in such functions now benefit from tail call optimization, allowing deeply-nested recursive functions to execute without blowing the stack.

A small but useful speed-up has been achieved for the common operation of navigating the child axis, by optimizing for the case where all nodes on the axis are retrieved.

The XMLChar module from Xerces has been incorporated into Saxon, and is now used in most places where XML names and XML characters are tested for validity. This performs a considerably more accurate check than Saxon was previously performing, especially for characters that are valid within names.

Extensions

The sql:insert extension instruction now tidies up properly by closing the prepared statement, which prevents Oracle running out of cursors.

Extension functions may now return an array; this is treated in the same way as if they return a list. Thanks to Aleksei Valikov for this enhancement.

Changes in version 7.6.5 (2003-07-10)

Defects cleared

A number of defects in XQuery parsing have been fixed silently, without being registered as bugs. For example, constructors for comments and processing instructions were not working at all.

761891 Saxon crashes with a NullPointerException if xsl:include or xsl:import is handled by a user-written URIResolver which returns a Source with no system ID set.

761894 When called from XQuery, the distinct-nodes() function crashes with an internal error.

763792 The extension function saxon:tokenize fails with a NullPointerException when the supplied argument is an empty sequence.

764172 The XQuery parser reports a spurious syntax error if a function declaration includes no return type.

768422 The XPath (and XQuery) parser fails on a construct such as element and X where the first token is element or attribute used as a QName, and the second token is an operator.

768423 The XQuery parser reports a spurious syntax error if, in the construct let $x := EXP, no space appears between the variable name and the := operator.

XQuery support

The command line interface by default no longer wraps the result sequence into an XML document with a result:sequence as its outermost element. This wrapping can still be achieved using the -wrap option on the command line. The default output format is now to output each item in the result sequence independently. If the item is a document node or an element node then it is serialized as XML. It it is anything else, then its string value is output on a new line.

This format is much more useful when the query is designed to produce a single XML or HTML document. Note that you can specify !method=html on the command line to invoke HTML rather than XML serialization.

A number of other changes and extensions have been made to the command line options for the net.sf.saxon.Query command. See Using XQuery for details.

Query execution can now be traced using the -T option. (Though the actual output is still rather XSLT-oriented). Note that line numbers for some expressions currently indicate where the expression ends (though for the more complex expressions such as element constructors and FLWOR expressions, it shows where it starts).

I have added support for the document{} constructor. (This is implemented using an internal instruction representing a conceptual xsl:document instruction, which is also now used for producing temporary trees in XSLT.)

Computed element and attribute constructors now accept the name as a value of type xs:QName as an alternative to xs:string.

Added extra validation of numeric character references, and support for characters above Unicode 65535 (surrogate pairs).

Improvements have been made in error messages, particularly for run-time errors. Many of these changes should also benefit XSLT users.

XSLT Changes

XHTML output is now selected automatically if the first element in the result tree has local name "html" and namespace URI "http://www.w3.org/1999/xhtml".

Grouping facilities (xsl:for-each-group) have been rewritten. Grouping keys may now be of any data type that supports equality comparison. Previous Saxon releases converted the values to strings before comparing; they are now compared according to the rules for their native data type. (There are actually very few cases where this gives a different answer: one such case is when comparing dates in different timezones). The function current-grouping-key() is supported. When using the group-by attribute, an item in the population may now be assigned to zero or more groups. The collation attribute is now available to define how string-valued keys should be compared; the default is Unicode codepoint collation. The new implementation offers better pipelining in cases where no sorting is needed (though it is not completely serial: for example with group-adjacent, the contents of one group of adjacent elements will be held in memory at any one time). {group020-025}

The specification says that it is an error when the set of grouping keys is heterogeneous (for example, a mixture of strings and numbers). Saxon currently detects this error in the case of group-adjacent, but does not detect it for group-by — non-comparable values are simply treated as being not equal.

Added support in the system-property function for the new properties xsl:is-schema-aware, xsl:supports-serialization, and xsl:supports-backwards-compatibility. The xsl:version property now returns 2.0, reflecting the fact that Saxon is now close to achieving full conformance with the draft 2.0 specification.

Functions and Operators

Added support for the abs function. (This anticipates a future draft of the specification.) {math-two15}

Conversion of decimal, float, and double to integer now works as specified, by truncating towards zero. {math-two16}

Conversion of a non-numeric string to a number, when invoked using a cast, or the xs:double() constructor function, or implicit conversion of an untyped node, now raises a dynamic error rather than returning NaN. The number() function, however, continues to return NaN.

The doc() function, when it fails to find a document (or to parse it as XML) now raises a fatal run-time error. This doesn't affect the behaviour of the document() function.

Extensibility

A value of type xs:anyURI can now be passed to an external Java method whose expected type is java.net.URI.

Internal changes

Comparison of two integer values was converting both to doubles; this has been fixed.

Handling of "head-tail recursion" has been made more efficient. Constructs that select all items in a sequence after the first (for example $x[position()!=1]) are now recognized specially when doing deferred evaluation; in effect they return a view of the underlying sequence. (Previous Saxon versions handled this in some cases, in a different way).

In XQuery, tail calls within user-defined functions are now optimized. This enables deeply recursive algorithms to execute without running out of stack space.

All the decisions about sorting a path expression into document order are now made at compile time.

Union, intersect, and except expressions of the form E[c1] | E[c2] are now rewritten as E[c1 or c2]. This avoids the need to evaluate E twice, and may avoid a sort.

The coding of the cast, instance of, and castable operators has been cleaned up.

The implementation of substring now optimizes for the case where the second and third arguments are integers - it no longer does everything using double arithmetic.

The StringValue class can now encapsulate any CharSequence, not necessarily always a String. This avoids some unnnecessary conversions, though for full effect the Item#getStringValue() method will also have to be changed to return a CharSequence.

The implementation of xsl:sequence has been improved. When a required type is given in the as attribute, the type of the value is now checked "on the fly" as items are written to the current output destination. The SequenceChecker is also capable of atomizing any nodes that are written (in fact, the nodes are atomized before they are even created). It also converts untyped atomic values to the required type. This is an inverse of the way checking is done by SequenceIterators during XPath evaluation; we now have a "pull" pipeline (the SequenceIterator) for iterating over the results of XPath expressions, and a "push" pipeline (the Receiver) for generating events on a result tree. Both are capable of on-the-fly type checking. The capability of xsl:sequence to do this is reused in xsl:template and xsl:function: an implicit xsl:sequence instruction is generated to wrap the contents of the template or function, and this inner instruction is responsible for any type checking and atomization.

Eventually I am hoping to move more and more to a "pull" model, where instructions are evaluated using iterators in the same way as expressions are evaluated today. The current model can be seen as transitional. Some instructions, in particular those used by XQuery such as Element and Attribute, are currently dual-mode - they work both as instructions and as expressions.

The expression generate-id(A) = generate-id(B), which is often used in XSLT 1.0 to compare node identities, is now rewritten internally as A is B. This requires some minor tweaks to ensure that the result is correct when either A or B or both is an empty sequence.

Changes in version 7.6 (2003-06-22)

Defects cleared

755834 Stack overflow error when xsl:with-param depends both on the context node and on variables

759502 The getOutputProperty() and getOutputProperties() methods of the Transformer object always return null

XSLT Changes

Added the as attribute to xsl:template. (This is not a very efficient implementation, as it breaks the pipeline) {seq017, seq906err, seq907err}

The [xsl:]default-xpath-namespace attribute is renamed [xsl:]xpath-default-namespace.

Two local variables in the same template or function can now have the same name. But parameters must still have unique names. (A side-effect of this change is a useful improvement in compilation speed for stylesheets with many global variables. The checking done in previous releases was implemented very inefficiently.) {var13, var901err}

I have added type-checking for global (stylesheet) parameters. It's not entirely clear here what the rules ought to be. Somewhat pragmatically, I have adopted the rule that if you supply a string (which will always be the case if parameters are provided on the command line), then the system attempts to convert it to the type specified in the xsl:param declaration; if you supply anything else, then it must (after Java to XPath conversion) be of exactly the required type, without any conversion.

XQuery

Saxon now supports XQuery. Details of how to use XQuery are provided at using-xquery.html, and information about the conformance to current working drafts is in conformance.html.

Function Library

The doc function is implemented.

The trace function has been changed so that it is never evaluated at compile time, even if both arguments are constants.

Internal Changes

The optimization of count($x)=0 as empty($x) was working only when the stylesheet specifies version="2.0". It now works with version="1.0" also.

Text-only temporary trees are used in a wider range of circumstances than before. This data structure is used when it is known statically that a temporary tree will consist of a single text node; it is a lot more efficient than a general-purpose temporary tree. It is now used in cases where the content of the variable invokes xsl:text and xsl:value-of including calls that are within xsl:for-each, xsl:choose, xsl:if, xsl:sequence and xsl:analyze-string, and also where it uses xsl:call-template, provided that all the subordinate instructions generate text nodes. This has been done in a generalized way which will eventually lead to static type inferencing working at the XSLT level in the same way as it currently works at the XPath level. For xsl:template, the type of the results is inferred from the as attribute if present, or from the contents of the template otherwise. {not v. thoroughly tested!}

Range variables (that is, variables declared in an XPath expression (for, some, every) are now stored on the local stack frame in the Bindery rather than directly in the XPathContext object. This simplifies the machinery for handling variables and allows instructions and expressions to be treated more interchangeably.

The Expression class has been refactored. The original class net.sf.saxon.expr.Expression is now an interface. The various expressions are now structured under ComputedExpression for "true" XPath expressions, net.sf.saxon.value.Value for constant values, and InstructionExpr for instructions (such as xsl:element) that act as expressions when used from XQuery. The utility methods, including the make factory method, are now in net.sf.saxon.expr.ExpressionTool.

When the dependencies of a ComputedExpression are determined, the information is now saved with the expression rather than being recalculated whenever it is needed. For complex expressions this calculation can be quite complex, and there are still some cases where it is being done at run-time.

Path expressions now use the standard type-checking machinery to check that both arguments of "/" are node-sets. This means that in some cases an error in this area will now be detected statically; and it means that if the expression is found statically to be safe, no run-time checking is done.

I have changed the way delayed evaluation is done: when an expression is evaluated lazily, a Closure object is created as a surrogate for the value. This now contains the expression itself together with all the context information that the expression needs. The separate SavedContext object is no longer used. The Closure is evaluated using the ordinary XPathContext object, which now holds a reference to the local stack frame. With delayed evaluation, this "stack frame" is not actually on the stack at all, it is in the heap, so it survives if the Closure is returned from a function call.

I have reverted to the principle used prior to Saxon 7.x, that lazy evaluation is used only for expressions that are expected to return a (non-singleton) sequence. However, the classification of such expressions is now much more accurate. The reason for this policy is that delaying evaluation of singleton expressions is usually not beneficial - it saves no memory, and incurs a cost for saving and restoring the context. Also, lazy evaluation is not used for expressions that have unusual context dependencies, for example those that depend on current(), position(), last(), or current-group(). This eliminates the problem of saving these values and ensuring that they are referenced correctly during the delayed evaluation.

The delayed evaluation code now evaluates the underlying expression at most once, thus ensuring that it never takes longer than direct evaluation. At the same time, if only the first item in the sequence is used then only the first item will be read. In a construct such as if (exists($x)) then $x else "nothing", the first reference to $x primes the iterator, and saves anything it reads in a buffer (called the "reservoir") within the Closure object. The second reference to $x starts by reading what it can from the reservoir, and if it needs more, it picks up iterating the underlying expression where the first evaluation left off. Once some user of the variable has accessed all the items in the underlying expression, the reservoir contains all the values needed and subsequent evaluations read the value from there.

Certain instructions, specifically those that are used in for XQuery as well as XSLT processing, now act as expressions as well as instructions. There are two modes of evaluating these instructions: the process() method causes the instruction to write its output to the current Receiver, while the evaluateItem() and iterate() methods return the results in the same way as for any other expression.

To support this mechanism, the process() method now takes an XPathContext as its argument, instead of the Controller. This is because in XQuery, the XPath context needs to be passed unchanged by an element constructor to its child expressions.

Extension Functions

Thanks to Gunther Schadow for these changes.

If an extension function returns null, this is now mapped to a zero-length sequence rather than to an external object that wraps null. This prevents some run-time type failures.

Exceptions thrown by an extension function are now wrapped in the XPathException thrown by the calling XPath expression, and hence in the TransformerException thrown by the transformation as a whole.

Public fields in Java classes are now accessible as zero-argument functions, for example the field Double.MIN_VALUE is accessible as Double:MIN_VALUE() with the namespace prefix bound as xmlns:Double="java:java.lang.Double". Non-static fields can be accessed by including the object instance as the first argument. It is not possible (and rarely necessary or desirable!) to modify public fields without use of a setter method. {saxon74}

The extension function saxon:is-null (which was incorrectly documented as saxon:if-null) is now redundant, and is dropped.

The undocumented saxon:trace function is dropped: use fn:trace instead.

Changes in version 7.5.1 (2003-05-21)

Defects Cleared

A couple of bugs in tail recursion have been fixed:

736802Tail recursive calls are sometimes not executed

A couple of cases have been reported where stylesheet errors cause a crash, these have been fixed.

I have recently installed JDK 1.4.1: this has revealed a couple of problems/inconstencies. The new JDK release appears to fix some problems with regular expression handling and with use of collations. This means that some of my tests produce subtly different results. In one or two cases the new results are clearly wrong, which means that my code was relying on the incorrect JDK 1.4.0 behavior. I have made adjustments where it seems appropriate, but particularly with collations, it is not always obvious what the right answer is.

New functionality

It is now possible to have two or more stylesheet functions with the same name, provided they have the same arity (number of parameters). An error is reported if two functions have the same name, arity, and import precedence. {func27, func901err}

Added support for xsl:template mode="#all". {modes30-34, inimode002}

Basic support for the functions format-date, format-time, and format-dateTime is provided. The xsl:date-format declaration is not implemented, and the third argument of the function is ignored. Not all formatting options are supported, and timezones are not handled properly yet. {date67}

Documentation

I have done a lot of work on the JavaDoc documentation of key interfaces and classes, and some general code cleaning up (for example, removal of redundant methods).

Internal Changes

The classes representing character encodings (in package net.sf.saxon.charcode) are now singleton classes; they only ever have one instance.

Changes in version 7.5 (2003-05-02)

Defects Cleared

The following bugs are fixed in this release:

687946 An internal error may occur when the key function is used on the right-hand-side of the / operator in a path expression.

689934 A ClassCastException occurs when comparing two values that are not comparable according to XPath 2.0 rules, for example string and integer.

690736 An IllegalStateException may occur when a sequence-valued variable is promoted by the optimizer to move it outside a predicate. The specific message is: java.lang.IllegalStateException: evaluateItem called on non-singleton variable reference at net.sf.saxon.expr.VariableReference.evaluateItem(VariableReference.java:202)

700837 An expression that uses a range variable (for example, a for or some expression) cannot be used within a predicate in an XSLT pattern.

706935 A NullPointerException occurs when processing an expression of the form //x[$var] or .//x[$var].

708789 When two or more items have sort keys that evaluate to the empty sequence, the resulting sort order is incorrect.

708998 Incorrect recovery action when xsl:copy-of generates a non-text node in the value of an attribute, comment, or processing instruction.

709347 An empty sequence is not being converted to an empty string when calling a function in backwards compatibility mode.

710093 The value of position() is calculated wrongly when navigating an axis on a JDOM tree.

721687 Crash in XPathEvaluator (Java XPath API) due to incorrect generation of type-checking code.

722537 Saxon crashes if an attribute is marked as an ID but is not a valid ID, which can happen when using a non-validating parser.

Withdrawn Facilities

I have finally dropped support for the old Java-only event-driven API. This was starting to interfere with the ability to optimize XSLT processing. The XPath interfaces remain available. Indeed, all the internal APIs remain available, but I am no longer trying to keep them as simple or as stable as is necessary for a supported external API. There were serious bugs in the ShowBooks.java sample application in Saxon 7.4 that somehow didn't show up in testing; this sample application has now been dropped.

I have also finally been forced to drop preview mode. It no longer works because the optimizer is becoming too clever. The optimizer uses lazy evaluation of expressions, which relies on the fact that the source document is immutable; preview mode violates this assumption. The correct way to handle this requirement is to write a document splitter as a SAX filter, breaking the source document into small pieces and invoking one transformation for each piece.

The deprecated extension functions get-user-data() and set-user-data() are no longer documented, though they have not yet been deleted from the product.

XSLT Changes

The required="yes|no" attribute on xsl:param is implemented. Currently, failure to supply a required parameter is a dynamic error, it is never detected statically. {ntmp01, ntmp901err}

The xsl:next-match instruction is implemented. {cnfr24-27}

The error that occurs when the name attribute of xsl:element or xsl:attribute contains an undeclared prefix (in the absence of the namespace attribute) is now recoverable. This brings it into line with the handling of other errors in this value. Note however that if the name is known statically then the error is reported at compile time and is fatal.

The error that occurs when a namespace or attribute node is written using xsl:copy-of, when there is no open element start tag, is now recoverable. This brings it into line with other instructions such as xsl:attribute and xsl:copy. {copy62}

Sequence Construction in XSLT

Implemented the new facility to allow construction of sequences in XSLT, when a variable binding element has content and an as attribute.

Added the as attribute to xsl:function. {func20}

Implemented the xsl:sequence instruction, including the as attribute which checks the type of the returned sequence and performs any necessary (and permitted) conversions. {seqNNN}

The xsl:result element is withdrawn. It can always be replaced by xsl:sequence. The as attribute, which denotes the return type of the function, should preferably be moved to the xsl:function element.

Parentless attribute, text, comment, processing-instruction, and namespace nodes are implemented. They are probably a little fragile - some operations on such nodes (e.g. xsl:number, xsl:apply-templates) have not been tested. The new rules for match patterns with parentless nodes have not been implemented: it's probably best to avoid using apply-templates on such nodes for the moment. {seqNNN}

Some instructions, e.g. xsl:value-of, incorrectly generate multiple text nodes, some other instructions may pre-merge the text nodes.

Handling of document nodes within the constructed sequence is probably not yet correct.

The separator attribute of xsl:copy-of is withdrawn.

Revised syntax for validating result trees

The 2 May 2003 WD changes the syntax for attaching type annotations to nodes in a result tree. These facilities are only partially implemented in Saxon, and no new functionality is provided in this release, but the existing functionality has been converted to use the new syntax. Specifically: