This file describes changes for versions 7.0 and later. For changes prior to version 7.0, see http://saxon.sf.net/saxon6.5.3/changes.html.
912721 tokenize() crashes with a NullPointerException if the regular expression is not known statically.
913392 Superfluous spaces are inserted between the text nodes used to construct the content of a comment or processing instruction
913504 In format-time(), when the value of the minutes or seconds field is zero, it is output as "0" rather than "00"
914497 Calling saxon:evaluate() can corrupt the values of local variables on the current stack frame
915656 Invalid namespace prefixes in the exclude-result-prefixes attribute of an xsl:stylesheet element in an imported module go unreported.
915661 An attempt to exclude a system namespace (e.g. xs, xsi, xslt) using exclude-result-prefixes succeeds even if the namespace is used in the result tree, causing non-well-formed output.
918146 In XQuery, FLWOR expressions are sometimes optimized incorrectly.
919394 A crash occurs when doing a Query with JDOM input and a SAXResult output destination.
919437 Using the "as" attribute on xsl:template or xsl:sequence may cause the result tree to be prematurely condensed, causing the possibility of a failure when subsequent attributes are added.
When an xsl:variable
element is traced (using the -T option) the value of the variable
is no longer output. This was causing the variable to be evaluated twice, which potentially causes additional
instructions to be executed, which were themselves traced, thus making the resulting output very difficult
to interpret. You can trace the values of variables using the trace() function.
Casting from any atomic type to xdt:untypedAtomic is now permitted.
Negative (BC) dates are allowed.
Overflow of durations is detected (each component of a duration must fit in an integer, and the total number of seconds must fit in a long.
The xsl:number
instruction, given format="01", now outputs the value zero as
"00" rather than "0". This is contrary to the spec, but the change is made so that time values
containing a zero value of hours or minutes are properly formatted by format-time(). A request
has been raised to change the spec.
The use of «chevron» characters in error messages has been discontinued, because Microsoft's MSDOS command-line is unable to display non-ASCII characters correctly.
841466
The default for a parameter with an as
attribute and no select
attribute should be (), not "".
843948 The XQuery does not allow spaces after the "=" sign in attributes within a direct element constructor.
843954 Pattern separators in the format-number() picture argument are rejected.
844436
Tail call optimization in XSLT templates fails when type checking of the template's result is required (by virtue
of an as
attribute).
851636 Conversion of a string consisting of one or more whitespace characters to a number causes a StringIndexOutOfBoundsException.
852344 format-number() output is wrong for numbers greater than 10e19
852622 Local variables don't work in the standalone XPath API.
854939 Two problems with character maps. Namespaces in the name of the character map are handled wrongly, and character maps don't work with method="text".
855943
In XQuery order by
, the second and subsequent sort keys cannot be written as function calls.
856136 The QueryStaticContext is not serially reusable.
859629
The parameters passed to saxon:evaluate()
must be singleton items, not sequences.
859632
The XQuery parser does not allow the where
expression to start with a left parenthesis.
874935
When running against a DOM or JDOM source, calling generate-id()
causes infinite recursion and a stack overflow.
884847 A NullPointerException can occur when using a DOMSource with a default namespace declaration.
891839 When XQuery is run from the command line, without the -wrap option, the output encoding is labelled as UTF-8 but is in whatever encoding the Java VM chooses.
895201 Infinite recursion can occur when a DOMSource is used, especially a DOMSource starting at a non-root node.
896533
In an XQuery typeswitch
expression, Saxon reports a spurious compile-time error if it can establish
at compile time that one of the branches will never be executed.
896547
In an XQuery FLWOR expression, if there is an order by
clause then the return
expression
can only deliver a single value for each item in the input sequence.
896947 Comparisons of two values where one is NaN can give true: they should always be false.
897989 Type checking: when a singleton is required Saxon is sometimes not checking for multiple items, but simply returning the first.
901695
Local variables used within an expression evaluated using saxon:evaluate()
don't work correctly.
904221 XQuery parsing errors can occur as a result of incorrect handling of lexical states. The exact circumstances are rare but highly variable.
It is now possible to specify an initial template to be invoked when the transformation starts.
From the command line this is done using the -it
option, followed by the template name. When
this option is used, no source file should be provided. In the API, the initial template is set using
setInitialTemplate
on the Controller
class. The transformation is then started
using the transform
method in the normal way, except that the source
argument
may be null. (It does not have to be null: if a source is supplied, it will still cause the context node
to be initialized, but will no longer cause execution to start by applying templates to this node).
An XPath expression within an attribute value may now contain a comment {avt15}
A newline is now written before a DOCTYPE declaration only if the DOCTYPE declaration is preceded by an XML declaration. This means that the newline is always omitted when serializing as HTML. There have been unconfirmed reports of browsers that require the DOCTYPE declaration to be right at the start of the file. An unintended but hopefully harmless side-effect of this change is that where the result tree is empty, previous releases serialized the result as a file containing only an XML declaration; the current release serializes it as an empty file containing no XML declaration.
The href
attribute of xsl:result-document
can now be omitted. Generally this is done
when writing the principal result tree, if you want to set the validation
or type
attributes. Don't do this if the stylesheet also writes output nodes before callling xsl:result-document
(it's an error to write two result trees to the same URI).
To achieve this, opening of the output file is delayed until the first attempt to write to it. A side-effect of this change is that when the result tree is empty, or when a failure occurs before writing any output, the output file is no longer written.
In xsl:key
, the as
attribute is no longer available. The data type of the key
is now implicit. To implement this, one index is built for each primitive type encountered in the supplied
argument of the key()
function: for example, if a key is used both as a string and as an integer, in different
calls, then both a string and an integer index will be built. NaN is no longer considered to match NaN for
indexing purposes.
The select
attribute of xsl:message
is implemented. {ver18}
The xsl:value-of
instruction can now contain a sequence constructor as an alternative
to the select
attribute. {atrs52}
The xsl:attribute
instruction can now take a separator
attribute. {atrs51, 52}
When a sequence constructor is used to construct a simple-valued node such as an attribute or comment, any non-text nodes in the sequence are now atomized. Previously (as in XSLT 1.0) they were ignored with a warning message.
The new xsl:document
instruction is implemented. This instruction has been added since
the 12 Nov 2003 working draft; it creates a document node, and optionally validates it using validation
and type
attributes. Unlike xsl:result-document
, the new document is not serialized.
See further details.
This version of Saxon aims to conform to the rules for a basic XSLT processor (that is, a non-schema-aware processor).
A future release will provide a schema-aware processor. To align with the two conformance levels, this release does
not support the validation
or type
attributes on instructions that create element, attribute,
or document nodes.
Within a direct element constructor, a namespace prefix may now be used in an attribute name, or within an attribute value constructor, that precedes the declaration of that namespace prefix on the same element.
There have been some minor changes to the XQuery API. When constructing a StaticQueryContext
, you must
now supply a Configuration
object; and when constructing a QueryProcessor
,
you must supply a StaticQueryContext
(which will already contain a Configuration
). These changes simplify the interface and help to ensure that
the various objects are consistent with each other.
A processing instruction constructor now uses the keyword processing-instruction
rather than pi
.
In the construct processing-instruction('xxx')
, I have removed the check that xxx
is a valid NCName. The specification does not require this in the case where the name is included in quotes.
Internal changes have been made to prepare the way for Saxon to become a schema-aware processor. The changes
are by way of calls to classes and methods that in this (non-schema-aware) version of Saxon have a dummy implementation.
These dummy implementations are generally included in the Configuration
object, which will be subclassed
in the schema-aware product.
Casting to xs:double
or xs:float
now fails if the value is non-numeric
(other than the special values INF
, -INF
, and NaN
);
previously it returned NaN
. This does not affect the number()
function,
which still returns NaN
. A consequence of this change is that "ABC" castable as xs:float
now returns false
.
The types xs:hexBinary
and xs:base64Binary
are implemented. (I have not
attempted to reduce the set of supported types to the basic set required in XSLT, partly because XSLT 2.0
and XQuery 1.0 differ in this regard.)
Functions contains(), starts-with(), ends-with(), substring-before(), substring-after(), string-value(), substring(), replace(), now treat an empty sequence (supplied in either argument) as if it were a zero-length string. The result can no longer be an empty sequence.
The translate() function is changed so that an empty sequence is treated as a zero-length string in the first argument, and is not allowed in the second and third arguments; the result can no longer be empty.
Reversed the order of the arguments to get-namespace-uri-for-prefix
. The prefix
is now the first argument, the element is the second.
Changed get-timezone-from-date() to return the correct timezone, and to return it as a dayTimeDuration; changed get-timezone-from-time() to return the timezone as a dayTimeDuration.
Addition or subtraction of a date|time|dateTime
to a duration is supported,
as defined in the specification. {date080-082}
Subtraction of two date|time|dateTime
values to return an
xdt:dayTimeDuration
is supported. {date083}
The idref()
function is implemented. {idky201, idky202, qxmp120}
All the functions whose names begin with get-
support the new name
(without the get-
prefix)
as a synonym. The old name will be dropped in a later release. (This change to the specification
was made after the 12 November 2003 working drafts).
The regular expression syntax in the functions matches
, replace
, and
tokenize
, and also in xsl:analyze-string
, is now implemented strictly
as defined in the language specifications. The has been achieved by integrating James Clark's code
for translating XML Schema regular expressions to Java JDK 1.4 regular expressions, with modifications
to handle the additional capabilities allowed in XPath and XSLT (such as reluctant quantifiers and
back-references).
The four flags m
, s
, i
, x
in regular expression
functions are now implemented as described in the specification.
The XSLT function function-available
now accepts a second argument, the required
arity. (This new feature has been added during the Last Call).
The types of external objects returned by extension functions or extension instructions can now be
represented using the actual Java class name. For example, a variable holding the value returned by
the instruction sql:connect
can be described with the attribute as="java:java.sql.Connection"
.
The system will check that the actual value conforms to this type. The java
prefix here must be bound
to the namespace URI http://saxon.sf.net/java-type
.
In mapping a Java class name to a QName with this namespace URI, any '$' signs in the class name are now mapped to '-' characters, to ensure that the resulting name is a legal QName.
A singleton atomic value can now be passed to an extension function that expects a List, a SequenceIterator, or a SequenceValue. (Saxon previously allowed sequences to be passed to such functions, but this failed in the special case where the sequence was a singleton).
I have added an extension function saxon:typeAnnotation
for diagnostic use. It takes a
node as argument, and returns a string representation of its type. In the case of an anonymous type, this
will be a path identifying the type within the schema.
Saxon 7.9 is tested with JDOM beta 10. The only change needed to move forward from beta 9 was to ensure that DOCTYPE nodes are skipped when processing the children of a Document node.
There have been some changes to key internal interfaces which affect a great many classes throughout the product, and which also occasionally surface in APIs.
The SequenceIterator
interface, which is widely used throughout the Saxon code, has been
changed so that it no longer has a hasNext()
method. Instead, the caller should invoke next()
repeatedly, and the end of the sequence is indicated by returning null. The purpose of this change is to reduce
the number of method calls, but more importantly, to reduce the amount of state information that iterators
have to hold, and to reduce the effect whereby each iterator in a pipeline looks ahead by one item, causing
an unnecessary amount of wasted effort if the pipeline is aborted, which happens for example when finding
the effective boolean value of a sequence.
The internal representation of type information has changed, because of the need to accommodate
user-defined types. A new class (actually an Interface) ItemType
has been introduced; this
and the occurrence indicator form the two parts of a SequenceType
. The method getItemType
on an expression now returns an object that implements this interface. For atomic values, this is an
AtomicType
object, which is also used in the heirarchy of schema types. In the case of user-defined
atomic types, this object contains a reference to the SimpleType object held in the schema data model
(which will be available only in the schema-aware version of the product).
For nodes, the ItemType
interface is implemented by a NodeTest
,
which is also used to represent conditions in an AxisStep of a path expression, and which is a subclass
of Pattern
. In the case of node types that specify the required content type, for example
attribute(*,xs:date)
, a ContentTypeTest
is used.
A number of the implementations of the tree model create transient wrapper nodes whenever a path
expression is used to select a set of nodes. A new optimization has been introduced so that in the case
where the nodes are immediately atomized, the tree model is allowed to return the typed value of a node
instead of returning the node. This firstly avoids the cost of creating the wrapper node, and secondly avoids the cost of
creating another iterator to process the typed value, in the case where the typed value is
a singleton. This is currently done only in the common case where the typed value is actually untypedAtomic.
Any user-defined implementations of the tree model that implements the interface AxisIterator
will need
to support the additional method setIsAtomizing(); however, an implementation that does nothing is
acceptable.
The method getAttributeValue(uri, localName)
has been removed from the NodeInfo interface,
so there is one less thing that suppliers of this interface have to provide. It is replaced by a helper
method in the Navigator
object.
The typeCode
passed down the Receiver
pipeline is now the name pool
fingerprint of the actual type name. This is also the value that is stored as a type annotation in the
data model. Currently this is supported only in the TinyTree. In the non-schema-aware product, the
typeCode will always be -1, indicating that the node is untyped.
The way that standard names are handled in known namespaces such as XSLT, Saxon, and XML Schema has changed. The fingerprints for these names are now compile-time constants. The NamePool code has been adapted so that these namespaces are specially recognized, and the standard constants are returned. This saves time and space when building the NamePool. It also makes it possible to have a standard schema defined as a static Java object for the built-in types.
In response to suggestions from Karsten Rucker, I have made some changes designed to conserve memory in both the standard tree and tiny tree implementations of the data model. In the standard tree, the document node no longer contains a reference to the factory used to build it: this was preventing the XML parser and its buffers being garbage-collected. In the tiny tree, the condense() operation is now called after building trees from source documents (it was previously called only for temporary trees). It also now condenses the buffer used for character data.
818677
There is no type-checking performed on parameters passed by xsl:call-template
or
xsl:apply-templates
822257 No error is reported if the stylesheet contains two named templates with the same name and the same import precedence.
826540 Null pointer exception caused by incorrect optimization (moving a variable reference out of a branch of a conditional expression)
827369 An exception (IndexOutOfBounds) occurs if the body of an XSLT stylesheet function is empty.
830841 If there are character maps defined in the stylesheet, then no escaping of special characters in attribute values takes place.
832548
Incorrect optimization occurs for the variable declared as at $x
in an XQuery FLWOR expression.
The variable typically retains a value of zero.
Implemented the new xsl:perform-sort
instruction. The select
attribute is currently mandatory, it is not possible to use it with a contained sequence
constructor {sortNNN}
Internally, the code for handling sorting has been unified between xsl:for-each
,
xsl:apply-templates
, and xsl:perform-sort
.
The xsl:sort-key
declaration and the sort()
function are withdrawn.
The instructions xsl:attribute
, xsl:comment
,
xsl:processing-instruction
, and xsl:namespace
now allow a select
attribute; if this attribute is present the content of the instruction must be empty.
The value of the select expression may be a sequence; all of the items in the
sequence are included in the result, by converting them to strings and separating them
with a single space. {arts50, posn80, node51-54}
An xsl:analyze-string
element is now required to have at least one
xsl:matching-substring
or xsl:non-matching-substring
child.
Tunnel parameters have been implemented. {var20-22, 905err, 906err}
The xsl:with-param
element now accepts an as
attribute.
Patterns of the form *:local-name
are now accepted {match51}
No code was being generated to perform run-time type checking or conversion of template parameters (bug 818677). This has been corrected. {var16, var904err}
It is now a compile-time error if a parameter is supplied in xsl:call-template
that doesn't
match any parameter declared in the template being called, or if it has the wrong type when compared with
the declared type of the parameter, or if the called template has a required
parameter that isn't supplied in the call.
On the command line, it is now possible to specify a parameter in the form +param=filename
.
The filename will be parsed as an XML document, and the document node will be passed to the stylesheet
as the value of the stylesheet parameter param
. If the filename is a directory, then all the
files contained immediately within the directory will be parsed and the result will be passed as a sequence
of document nodes.
Parameters supplied on the command line are now treated as untyped atomic values rather than strings, which means they can be supplied where the expected type is (say) integer or date; the string supplied as the value of the parameter will automatically be converted to the required type.
Tracing (with the -T
option) has been improved. Some instructions such as
xsl:analyze-string
were not being traced. The trace output now includes the
names and values of variables (the value is truncated to the first four items in a sequence,
and the first 20 characters of each item; nodes are shown by their generate-id() value). More
information is available to user-written trace listeners, in particular, the Controller is
now available (as a property of the InstructionInfo object). The InstructionInfo interface now
has a general-purpose getProperty() method, allowing additional information to be made available
without changing the interface for existing TraceListeners.
The rule is now enforced that the namespace URI of a function name, variable name, mode name (etc.) cannot be a reserved namespace URI (such as the XML namespace, the XSLT namespace, or the XML Schema namespace).
In xsl:number
, the format tokens one
, first
, and 1st
are no longer available; they have been replaced by the format tokens W
, w
, and
Ww
(for upper case, lower case, and title case words), together with the optional attribute
ordinal="yes"
. These sequences are currently implemented for English and
partially (see below) for German. {numb14, 24, 25}
The functions format-date()
, format-time()
, and format-dateTime()
have been updated to match the latest specs. Specifically, they now take either two or five arguments (though
Saxon currently ignores the last two); names of months or days of the week are requested using
presentation modifiers N (upper-case), n (lower-case) and Nn (title case), and all other modifiers are
interpreted in the same way as xsl:number
. {date067, date068, date073}
I have extended the support for localizing xsl:number
and format-date
in German. It's mainly a proof of concept, to show
that it's possible. The code is in module net.sf.saxon.number.Numberer_de
, and similar
modules can be written for other languages: just change the last two letters of the class name
to the language code used. If you do write implementations for other languages, I will be happy to include
them in future Saxon distributions. {numb28, numb29}
The attribute disable-output-escaping
is no longer supported on xsl:attribute
.
In theory, you should be able to use character maps instead.
Casting a string to an xs:QName is now supported: but only in XSLT (not in XPath or XQuery), and only when an explicit cast or constructor function is invoked (not, for example, when passing an untyped atomic value to a function that expects an xs:QName).
Literal result elements now compile internally into xsl:element and xsl:attribute instructions. This results in changes to trace output: each attribute is now traced as a separate instruction.
Importing library modules is now supported, using the import module
syntax.
At present, all the modules in a query are compiled at the same time. The location of each
module (the at
clause in import module
) must be included the first
time a particular module is imported, but it may be omitted on subsequent occasions (modules are processed
recursively, depth first). If a module for a particular namespace is already loaded, then the
at
clause is ignored. Optionally, applications can precompile library modules and register
them in the Configuration object, and they will then be found when another module attempts to import
them by namespace alone.
The query parser now attempts recovery after a syntax error, resuming parsing at the next semicolon.
The predeclared namespace prefix local
is available for use when defining local
user-defined functions.
Creating two attributes with the same name for the same element is now reported as an error. Previously (as in XSLT) the second simply overwrote the first.
Namespace declarations in direct element constructors are now correctly scoped; namespace prefixes in element and attribute names in a direct element constructor are correctly validated. (This is a bug fix). The rules concerning the distinction between active and passive namespaces are properly applied (active namespaces are copied to the result, passive namespaces are not).
The order by
options empty least
and empty greatest
have
been implemented. At the same time, code has been added to check that the sort key is not a sequence
of length greater than one.
It is now possible to specify a parameter on the command line in the form +param=filename
.
The filename will be parsed as an XML document, and the document node will be passed to the stylesheet
as the value of the external variable param
. If the filename is a directory, then all the
files contained immediately within the directory will be parsed and the result will be passed as a sequence
of document nodes.
A query can now be included directly in the command line rather than reading it from a file. It is
written enclosed in curly braces. For example java net.sf.saxon.Query {doc('a.xml')//p[1]}
selects elements
within the file a.xml in the current directory.
Any filename passed using the -s
option is no longer accessible via the input()
function, but is still accessible as the context node.
The namespaces for the fn:
and xdt:
prefixes, and the URI for the
Unicode codepoint collation, have been updated to
match the latest specs (replace 2003/05
with 2003/11
).
The new rules for double-to-string conversion are implemented. Values outside the range 1e-6 to 1e+6 are expressed in exponential notation, values within this range are formatted as decimals or integers. {expr111, 112}
In the XPath free-standing API, it is now possible to use the fn:
namespace when
calling functions in the core library. (It is also still possible to use these functions without
any namespace).
A rewrite optimization has been added for expressions of the form E = M to N
when
E, M, and N are known statically to be integers. The test reduces to E ge M and E le N
,
which avoids comparing E with every integer in the range. When the test is of the form
SEQ[position() = M to N]
and M and N are constant integers,
an even more powerful optimization is used that stops the iteration over SEQ as soon
as item N is reached. {opt015, opt016}
The "minimax" optimization has been reinstated for comparing numeric sequences. This translates
an expression of the form N1 < N2
into the form min(N1) < max(N2)
.
It was used prior to version 7.4, at which point it was found to be unsafe if both sequences contain
untyped atomic values. It is now used only if at least one of the sequences is statically known to
contain numeric values only.
The data types gYear
, gYearMonth
, gMonth
,
gMonthDay
, and gDay
are implemented. {date076-079}
The functions get-current-date/time/dateTime()
now return a date/time in the implicit
timezone (as determined from the current Java Locale). {date001}
The function implicit-timezone()
has been implemented. {date069}
The functions get-timezone-from-date/time/dateTime()
now return the timezone as a
value of type xdt:dayTimeDuration
, not as a string. This means the result is displayed
as (for example) PT3H rather than +03:00. {date048-052}
Casting from xs:dateTime
to xs:date
now retains the timezone, if there is
one.
The functions adjust-date-to-timezone()
, adjust-dateTime-to-timezone()
,
and adjust-time-to-timezone()
are implemented.
Note that the function get-hours-from-date/dateTime()
returns the localized
component value. It has always done this in the Saxon implementation, and the latest working
draft now makes this the correct behavior.
The function sum()
now returns an integer 0 rather than a double 0.0 when the input
sequence is empty. (An integer can be used anywhere a double can be used, but the converse is not true).
Note: this change was agreed by the Working Groups, but was inadvertently left out of the published drafts.
The function root()
now accepts an empty sequence as its argument (and returns an empty
sequence).
The function distinct-values()
now returns only one NaN if there are multiple
NaN values in the sequence. {group026}
The function escape-uri()
no longer escapes square brackets when the escape-reserved
argument is false.
The function function-available()
now recognizes function names that use the fn:
namespace. {func32}
The functions contains
, ends-with
, starts-with
,
substring-after
, and substring-before
now use the Unicode codepoint
collation if no explicit collation is supplied; they do not use the default collation.
The functions min
and max
now convert untyped atomic values
to double rather than string, and return NaN if the sequence contains a NaN value. {expr57}
The function get-in-scope-namespaces
is renamed get-in-scope-prefixes
.
{nspc45}
The function insert-before
now allows the insert position to be beyond the end
of the sequence (which causes the new sequence to be appended to the original). {expr71}
The function reverse
has been implemented. {expr91}
In a range expression, $x to $y
, it is no longer possible to produce a descending
sequence of integers. Instead, if the start point is less than the end point, an empty sequence is
returned. This change is to allow the construct for $i in 1 to count($seq) ...
to work
as expected. To get a reverse sequence of integers, use reverse(1 to 5)
.
The function call remove($seq, 1)
is now treated specially, it is optimized in the
same way as $seq[position()!=1]
due to the common use of this expression in head-tail
recursion. {expr92}
The functions context-item
, distinct-nodes
,
input
, item-at
, node-kind
,
sequence-node-identical
, and string-pad
have been dropped.
The function concat
must now have at least two arguments. (This reverts to the
XPath 1.0 specification.)
In regular expression matching (e.g. in replace() and in the XSLT regex-group() function, group 0 now refers to the entire substring that matched the regex. {regex08}
The isnot
operator has been dropped.
A new top-level declaration is introduced: saxon:import-xquery
allows the functions
defined in an XQuery library module to be called from XPath expressions in the stylesheet. See
extensions.html for details.
The extension functions saxon:before
, saxon:get-user-data
, and
saxon:set-user-data
, which have not been documented for some
while, have finally been removed.
The rules in the spec on extension attributes have been clarified, in a way that makes it clear
that the saxon:allow-avt
attribute on xsl:call-template
is not conformant.
This attribute has therefore been removed, and dynamic calls on templates are enabled instead using
a new extension element, saxon:call-template
. In changing your stylesheet to use the
new instruction, remember to set extension-element-prefixes="saxon"
. {saxon21}
The sequence type as="java:java.lang.Object"
can now be used to refer to the type of a wrapped Java
object returned by an extension function. The namespace prefix "java" binds to the namespace URI
http://saxon.sf.net/java-type
. (My thinking is that eventually it will be possible to
use any Java class name to define the actual Java class of the external object. At present, however,
only java.lang.Object
is recognized.) {saxon51}
I have changed the design of the SQL extension so that the database connection is now stored
explicitly in a variable, and the value of this variable is supplied on instructions such as
sql:insert
and sql:query
. See the books-sql.xsl
sample
application to see how this works. {saxon51}
When calling Java methods, any XPath value can now be passed to a method that expects a DOM NodeList; a run-time ClassCastException occurs if the value contains an item that is not a node, or a node that is not represented by a DOM org.w3c.xml.Node (e.g. if it is a JDOM node). Not tested.
In general these changes affect both XSLT and XQuery.
It is no longer always true that namespaces present on a parent element are automatically inherited
by the children. XML Namespaces 1.1 allows namespaces to be undeclared using a construct such as
xmlns:x=""
, and this capability is reflected in the model. Namespace undeclarations will
be output by the serializer if the serialization property undeclare-namespaces="yes"
is set.
In XSLT, this can be defined on xsl:output
. In XQuery, it can be controlled from the API (using
the constant SaxonOutputKeys.UNDECLARE_NAMESPACES
or the command line option
!{http://saxon.sf.net}undeclare-namespaces=yes
). If you don't set this option, you probably
won't notice any difference. However, the change does mean that you sometimes get an unexpected
xmlns=""
undeclaration.
The rules for both XSLT and XQuery say that namespace nodes are never inherited when you add an element
to a new parent element. Saxon isn't quite implementing this yet. Given the way Saxon represents namespaces
internally, namespaces get inherited unless the code goes out of its way to prevent it, and at the moment
this is only happening when an element is explicitly copied using xsl:copy
or the equivalent
in XQuery.
Support for the namespace axis has been reinstated in JDOM. This underpins functions such
as get-in-scope-prefixes
, and ensures that namespaces are properly copied by
xsl:copy-of
. {axes-jdom049, 055, 129 etc}
For ease of testing, a new command line interface net.sf.saxon.jdom.JDOMTransform
has
been added. The arguments are exactly the same as the normal net.sf.saxon.Transform
command.
The constructor for class NodeWrapper
has been made protected. A new wrap() method
has been supplied on the DocumentWrapper class, allowing any node in the document to be wrapped,
provided that the document node has been wrapped.
Whitespace stripping should now work for JDOM input in the same way as for other tree models (see below).
A set of DOM wrapper classes have been written, analogous to the JDOM wrapper classes. {axes-dom[001-nnn]}
The DOM wrapper has been tested with the Crimson DOM provided in JDK 1.4 and with Xerces 2.5.0. Different DOM implementations are known to vary widely. Saxon's DOM interface does not attempt to deal with entity reference nodes, which appears to be OK with the default configuration of these two parsers. CDATA sections are treated as text nodes, no attempt is made to merge them with adjacent text nodes.
The DOM interface is very inefficient, for example it has to resolve namespace prefixes by searching the namespace declarations every time a node is referenced. Don't use it if performance matters to you.
I have changed the code for writing output to a DOMResult so that it now uses DOM level 2 interfaces
Document#createElementNS()
and Element#setAttributeNS()
. As a result, it should
now be possible to use methods such as getPrefix
and getNamespaceURI
on these
nodes.
If you supply any kind of pre-built tree as input to the transformation (that is, if the Source object is a DOMSource or a NodeInfo), then Saxon no longer strips down the tree and rebuilds it to implement whitespace stripping. Instead, if whitespace stripping has been requested, it wraps the supplied tree in a whitespace-stripping envelope, which hides whitespace text nodes that the stylesheet has asked to be stripped, on the fly. Because this is done in a separate layer above the data model, it works for all data model implementations (JDOM, DOM, and native Saxon), and it imposes no overhead when it is not needed - that is, when the stylesheet doesn't request whitespace stripping, and when nodes are stripped during tree construction, which is still done if you supply a SAXSource or StreamSource as the input). {axes-dom154, 155}
Note that this whitespace stripping layer strips whitespace as requested by the stylesheet,
regardless of whether any xml:space
attribute is present in the tree to override this.
In the XQuery interface, whitespace stripping can be requested from the command line or from the API, but not from the query itself.
Whitespace stripping applies equally to documents loaded using the document()
or doc()
function as to
the initial source document; but it is not applied to documents supplied as stylesheet (or query)
parameters.
This model now allows the XQuery and XPath APIs to operate directly on a DOM or JDOM structure, with the results of the expression being references to the actual DOM or JDOM nodes, not to copies.
The getLocalName
method of the NodeInfo
has been renamed
getLocalPart
to avoid conflict with the DOM method of the same name. The
getLocalPart
method returns "" rather than null for a node with no name, and
it returns a value whether or not the node is in a namespace.
761894: distinct-nodes() crashes when used in an XQuery user-defined function.
770785: Tail-call optimization bug in xsl:apply-templates when the select expression has context dependencies.
783382: Saxon infers the type document-node() rather than string for an empty variable.
788731: Infinite recursion in compiler while optimizing current()
788748: Incorrect static type inference for (integer div integer).
799095: Null Pointer Exception while compiling an XQuery using a variable reference with a direct attribute constructor.
805148: An attribute or namespace node may be deemed identical to an element node if they have the same offset in their respective arrays.
805149: The result of generate-id() for an attribute or namespace node may contain non-alphanumeric characters.
810626: The xsl:number instruction crashes when given a format picture containing a single punctuation token, for example format="*".
810644: The operands of the "is" operator are incorrectly compiled.
811914: The unordered() function fails if the argument is a sequence whose items are all compile-time constants.
The xsl:character-map
and xsl:output-character
elements have been
implemented. See further details. {charmapNNN}
The rules
attribute on saxon:collation
is now implemented, allowing
a fully-customized collation to be created using the syntax for the java
RuleBasedCollator
. {sort25}
I have added a compile-time warning message if a variable declaration has no following sibling instructions. This is permitted, but has no useful effect and probably means the user has made a mistake.
The xsl:number
element now takes a select
attribute to select the node
to be numbered. (This anticipates a change in the next working draft). {numb23}
The xsl:sequence
instruction can now have either a select
attribute or
child instructions, but not both. (This anticipates a change in the next working draft).
Stylesheet attributes whose value is a name, or a number, or an enumeration such as yes|no
,
now allow leading and trailing spaces in the value. This feature has not been tested very
thoroughly.
When running with version="2.0"
, the xsl:value-of
instruction now defaults the separator
attribute to a single space. (With version="1.0"
, in the absence of a separator
attribute,
it continues to discard all but the first item in a sequence.) {seq019, seq020}
The constructs element()
, element(*, T)
, attribute()
, and
attribute(*, T)
are now allowed as NodeTests within a pattern. If a type is specified, the
default priority is 0. If no type is specified, it is -0.5.
{schema019}
The construct [xsl:]exclude-result-prefixes="#all"
is now implemented. {nspc48}
Where a namespace is on the list of excluded namespaces for a literal result element, but is used in the name of the element or one of its attributes, Saxon was ignoring the request to exclude the namespace. The effect was that if more than one prefix was assigned to the namespace, unnecessary namespace declarations were output. I have changed the code so it no longer ignores the request to exclude the namespace; instead, the namespace declaration that is actually needed will be reinstated by the namespace fixup process. {nspc49}
Enhancements have been made to the number formatting in format-date()
,
format-dateTime()
, and format-time()
. The format modifier "o" can be
used to request ordinal numbers (1st, 2nd, 3rd
). The format modifier "a" (or "A" for uppercase)
gives numbering in words (one, two, three
).
They can also be combined ("ao" or "Ao") to give ordinal words (first, second, third
).
These formats currently always produce output in English. {date068}
Ordinal numbering is also available in xsl:number
. (This was implemented to underpin date
formatting, but is available generally.) Use the format token 1st
for the sequence
1st, 2nd, 3rd, 4th...
, the format token first
or FIRST
for the
sequence first, second, third, fourth...
(in either upper or lower case). The format tokens
one
and ONE
have always been available, though not well documented. The full
list of supported formats is documented in xsl-elements.html.
Note that these are available only for English; sequences for other languages can be implemented by
writing a user-defined Numberer, as described in
Implementing a numbering sequence.
{numb24, numb25}
Sorting a set of numeric values, without specifying data-type="number", now handles NaNs correctly.
The result of dividing two integers using the div
operator is now a decimal, rather than a double. A consequence
if this is that dividing by zero gives a run-time error, rather than Infinity. It may also affect the
precision of results, but the effect is likely to be minor in most practical cases. (Actually, the
computation is currently carried out with double precision, and then converted to a decimal, so the
results are not as accurate as they could be.) This change exposed a
bug in the handling of the mod
operator for decimal arguments, which has been fixed.
A consequence of this change is that the avg()
function when applied to a sequence of integers
now returns an xs:decimal.
Numeric promotion is implemented in the type-checking rules: that is, an integer or decimal value can be supplied where a float or double is expected, and a float can be supplied where a double is expected. (The fact that this was not implemented before was something I had overlooked.)
The expression /
(or a path expression starting with /) now returns
an error if the context item is in a tree whose root is not a document node. The root()
function,
however, succeeds in this case. {seq018, seq908err, seq909err}
The constructs element()
, element(*, T)
, attribute()
, and
attribute(*, T)
are now allowed as NodeTests within a path expression.
{type042, schema016-018}
In the processing-instruction()
NodeTest, the quotes around the processing instruction name
are now optional. The name must now be a valid NCName, whether or not it is enclosed in quotes. {copy63, node03}
The changes affecting XPath also affect XQuery.
Improved documentation for the XQuery API is available on the Using XQuery page, and also in the JavaDoc for the relevant classes.
I have implemented the changes to the Query Prolog syntax introduced in the August Working Draft. These include the addition of semicolons as separators between declarations in the prolog; replacement of the keyword "define" by "declare" throughout; removal of the "=" symbol except in "declare namespace"; and relaxing the rules on ordering of declarations.
The declare base-uri
declaration in the prolog is now supported.
The default element namespace
now works as specified. This is somewhat different from
the XSLT specification: in XQuery, the default element namespace affects unprefixed element names whether
they appear in element constructors or in path expressions. It isn't possible (as in XSLT 2.0) to specify
different defaults for names in the input document and names in the output document. {ns/addq2}
The typeswitch
expression is now implemented. {xmp/addq10}
The collation
option of the order by
clause in a FLWOR expression
is now implemented. See Collation URIs for details of
the URIs that can be specified. {r/addq1}
Sorting a set of numeric values now handles NaNs correctly (NaNs are considered equal to each other and less than any other value).
Computed comment constructors and processing-instruction constructors are supported. {xmp/addq4}
Entity references and character references are supported in string literals (previously they were supported only in element and attribute content).
It should now be possible to use any output encoding that is supported by the Java VM, without defining
a custom CharacterSet
class. In JDK 1.4, Java allows the application to determine whether
particular characters are encodable using a given character set, and this information is now used to
decide whether to replace the character with a numeric character reference. Because I don't know how
efficient this mechanism is, I still use the old mechanism for character sets that were previously
supported in Saxon, and the mechanism for defining user-defined character sets is still available
for the time being. It has been restricted, however, so that Saxon will only attempt to load a
PluggableCharacterSet
for encoding XXX if the output property encoding.XXX="class-name"
is present.
The code now allows for the possibility that character encodings other than UTF-8 and UTF-16 may be capable of encoding supplemental characters (characters whose Unicode codepoints are above 65535). Previously such characters were always output as numeric character references, except when using UTF-8 and UTF-16. A consequence of this is that user-written PluggableCharacterSet implementations must be prepared to categorize such characters.
There are some differences between the character encodings supported by the old java.io
package
and the new java.nio
package. If the requested encoding is not supported by the java.nio
package, then
all non-ASCII characters will be represented using numeric character references. If the encoding is
not supported by the java.io
package, then Saxon will revert to using UTF-8 as the actual output
encoding. A list of the character encodings
supported in the java.nio
package can be obtained by using the command java net.sf.saxon.charcode.CharacterSetFactory
,
with no parameters.
The HTML serialization method should now handle INS and DEL elements correctly.
User-written emitters were not working; the code has been fixed but not tested.
Collations can now be specified directly using a URI, without requiring a saxon:collation
declaration. This makes them available in XQuery and XPath applications as well as XSLT. The URI takes
a form such as http://saxon.sf.net/collation?lang=de;strength=primary
and is specified
fully in Collation URIs. {r/addq1, sort26}
The collection
function is implemented. The Saxon implementation interprets
the URI of the collection as a reference to an XML document that acts as a catalogue listing the
documents in the collection. An example of a catalogue document is:
<collection> <doc href="doc1.xml"/> <doc href="doc2.xml"/> <doc href="doc3.xml"/> </collection>
In effect, collection("a.xml")
is merely a shorthand for
document(document("a.xml")/collection/doc/@href)
. My thinking is to extend the
catalogue structure in future to allow options to be specified for how errors are handled, how
the documents are parsed (e.g. validation, space stripping), and whether the documents should be
locked in memory. {mdocs19}
The tokenize()
function now supports the facility to split a string into its invididual
characters if the regex matches a zero-length string. For example, tokenize('alphabet', '')
returns the sequence ('a', 'l', 'p', 'h', 'a', 'b', 'e', 't')
. Note: there has
been some discussion on this topic in the public-qt-comments list, and the specification could
change as a result. {regex19}
In XPath expressions in XSLT stylesheets, core functions can now appear in the fn:
namespace (currently http://www.w3.org/2003/05/xpath-functions
). Of course, they
can also be unprefixed. {coreFunction101}
The result of dividing two integers is now a decimal. {math-two17}
Values of type xs:language
are now properly validated. {type008}
An identity transformation is now able to extract a subtree of a DOMSource starting at any element.
This clears a long-standing bug 548228.
A new test exampleDOMsubtree
has been added to TraxExamples
to demonstrate
the capability.
The various bit-valued static properties of an XPath expression (dependencies, cardinality, and other special properties) have now been brought together into a single word, whose value is computed once and stored on each node in the expression tree rather than being calculated on demand. (There were some cases where this calculation was still being done at run-time).
Some changes have been made to the design of tail-call optimization. This is mainly to fix a bug arising when apply-templates uses a select expression with context dependencies. The decision that a call is a tail call is now made statically rather than dynamically, to avoid the costs of creating a closure for the select expression when this is not needed.
In some cases XSLT stylesheet functions are now compiled to the UserFunction object originally introduced to support XQuery. This is done where the body of the function is sufficiently simple: this basically means that it must consist of a sequence of xsl:param elements, then xsl:variable elements, and finally an xsl:sequence element with a select attribute to define the result of the function. The effect is that recursive calls in such functions now benefit from tail call optimization, allowing deeply-nested recursive functions to execute without blowing the stack.
A small but useful speed-up has been achieved for the common operation of navigating the child axis, by optimizing for the case where all nodes on the axis are retrieved.
The XMLChar
module from Xerces has been incorporated into Saxon, and is now used in
most places where XML names and XML characters are tested for validity. This performs a considerably
more accurate check than Saxon was previously performing, especially for characters that are valid
within names.
The sql:insert
extension instruction now tidies up properly by closing the
prepared statement, which prevents Oracle running out of cursors.
Extension functions may now return an array; this is treated in the same way as if they return a list. Thanks to Aleksei Valikov for this enhancement.
A number of defects in XQuery parsing have been fixed silently, without being registered as bugs. For example, constructors for comments and processing instructions were not working at all.
761891
Saxon crashes with a NullPointerException if xsl:include
or xsl:import
is handled
by a user-written URIResolver which returns a Source
with no system ID set.
761894
When called from XQuery, the distinct-nodes()
function crashes with an internal error.
763792
The extension function saxon:tokenize
fails with a NullPointerException when the supplied
argument is an empty sequence.
764172 The XQuery parser reports a spurious syntax error if a function declaration includes no return type.
768422
The XPath (and XQuery) parser fails on a construct such as element and X
where the first
token is element
or attribute
used as a QName, and the second token is an
operator.
768423
The XQuery parser reports a spurious syntax error if, in the construct let $x := EXP
, no
space appears between the variable name and the :=
operator.
The command line interface by default no longer wraps the result sequence into an XML document with a
result:sequence
as its outermost element. This wrapping can still be achieved using the
-wrap
option on the command line. The default output format is now to output each item
in the result sequence independently. If the item is a document node or an element node then it is
serialized as XML. It it is anything else, then its string value is output on a new line.
This format is much more useful when the query is designed to produce a single XML or HTML document.
Note that you can specify !method=html
on the command line to invoke HTML rather than
XML serialization.
A number of other changes and extensions have been made to the command line options for the
net.sf.saxon.Query
command. See Using XQuery for details.
Query execution can now be traced using the -T
option. (Though the actual output is
still rather XSLT-oriented). Note that line numbers for some expressions currently indicate where the
expression ends (though for the more complex expressions such as element constructors and FLWOR
expressions, it shows where it starts).
I have added support for the document{}
constructor. (This is implemented using an internal
instruction representing a conceptual xsl:document
instruction, which is also now used for
producing temporary trees in XSLT.)
Computed element and attribute constructors now accept the name as a value of type xs:QName
as an alternative to xs:string
.
Added extra validation of numeric character references, and support for characters above Unicode 65535 (surrogate pairs).
Improvements have been made in error messages, particularly for run-time errors. Many of these changes should also benefit XSLT users.
XHTML output is now selected automatically if the first element in the result tree has local name "html" and namespace URI "http://www.w3.org/1999/xhtml".
Grouping facilities (xsl:for-each-group
) have been rewritten.
Grouping keys may now be of any data type that supports equality comparison. Previous Saxon
releases converted the values to strings before comparing; they are now compared according to
the rules for their native data type. (There are actually very few cases where this gives a different
answer: one such case is when comparing dates in different timezones). The function
current-grouping-key()
is supported. When using the group-by
attribute,
an item in the population may now be assigned to zero or more groups. The collation
attribute is now available to define how string-valued keys should be compared; the default
is Unicode codepoint collation. The new implementation
offers better pipelining in cases where no sorting is needed (though it is not completely serial:
for example with group-adjacent, the contents of one group of adjacent elements will be
held in memory at any one time). {group020-025}
The specification says that it is an error when the set of grouping keys is heterogeneous (for example, a mixture of strings and numbers). Saxon currently detects this error in the case of group-adjacent, but does not detect it for group-by — non-comparable values are simply treated as being not equal.
Added support in the system-property
function for the new properties
xsl:is-schema-aware
, xsl:supports-serialization
, and
xsl:supports-backwards-compatibility
. The xsl:version
property
now returns 2.0
, reflecting the fact that Saxon is now close to achieving
full conformance with the draft 2.0 specification.
Added support for the abs
function. (This anticipates a future draft of the
specification.) {math-two15}
Conversion of decimal, float, and double to integer now works as specified, by truncating towards zero. {math-two16}
Conversion of a non-numeric string to a number, when invoked using a cast, or the xs:double()
constructor function, or implicit conversion of an untyped node, now raises a dynamic error rather than
returning NaN
. The number()
function, however, continues to return NaN
.
The doc()
function, when it fails to find a document (or to parse it as XML) now raises
a fatal run-time error. This doesn't affect the behaviour of the document()
function.
A value of type xs:anyURI
can now be passed to an external Java method
whose expected type is java.net.URI
.
Comparison of two integer values was converting both to doubles; this has been fixed.
Handling of "head-tail recursion" has been made more efficient. Constructs that select
all items in a sequence after the first (for example $x[position()!=1]
) are now
recognized specially when doing deferred evaluation; in effect they return a view of the
underlying sequence. (Previous Saxon versions handled this in some cases, in a different way).
In XQuery, tail calls within user-defined functions are now optimized. This enables deeply recursive algorithms to execute without running out of stack space.
All the decisions about sorting a path expression into document order are now made at compile time.
Union, intersect, and except expressions of the form E[c1] | E[c2]
are now rewritten
as E[c1 or c2]
. This avoids the need to evaluate E twice, and may avoid a sort.
The coding of the cast, instance of, and castable operators has been cleaned up.
The implementation of substring
now optimizes for the case where the second
and third arguments are integers - it no longer does everything using double arithmetic.
The StringValue
class can now encapsulate any CharSequence
, not
necessarily always a String
. This avoids some unnnecessary conversions, though
for full effect the Item#getStringValue()
method will also have to be changed to
return a CharSequence
.
The implementation of xsl:sequence
has been improved. When a required type is
given in the as
attribute, the type of the value is now checked "on the fly"
as items are written to the current output destination. The SequenceChecker
is
also capable of atomizing any nodes that are written (in fact, the nodes are atomized before they
are even created). It also converts untyped atomic values to the required type.
This is an inverse of the way checking is
done by SequenceIterators during XPath evaluation; we now have a "pull" pipeline (the SequenceIterator)
for iterating over the results of XPath expressions, and a "push" pipeline (the Receiver) for generating
events on a result tree. Both are capable of on-the-fly type checking. The capability of
xsl:sequence
to do this is reused in xsl:template
and xsl:function
:
an implicit xsl:sequence
instruction is generated to wrap the contents of the template or function, and this inner instruction
is responsible for any type checking and atomization.
Eventually I am hoping to move more and more to a "pull" model, where instructions are evaluated using iterators in the same way as expressions are evaluated today. The current model can be seen as transitional. Some instructions, in particular those used by XQuery such as Element and Attribute, are currently dual-mode - they work both as instructions and as expressions.
The expression generate-id(A) = generate-id(B)
, which is often used in XSLT 1.0
to compare node identities, is now rewritten internally as A is B
. This requires some
minor tweaks to ensure that the result is correct when either A or B or both is an empty sequence.
759502 The getOutputProperty() and getOutputProperties() methods of the Transformer object always return null
Added the as
attribute to xsl:template
. (This is not a very efficient
implementation, as it breaks the pipeline) {seq017, seq906err, seq907err}
The [xsl:]default-xpath-namespace
attribute is renamed
[xsl:]xpath-default-namespace
.
Two local variables in the same template or function can now have the same name. But parameters must still have unique names. (A side-effect of this change is a useful improvement in compilation speed for stylesheets with many global variables. The checking done in previous releases was implemented very inefficiently.) {var13, var901err}
I have added type-checking for global (stylesheet) parameters. It's not entirely clear here what the
rules ought to be. Somewhat pragmatically, I have adopted the rule that if you supply a string (which will
always be the case if parameters are provided on the command line), then the system attempts to convert it
to the type specified in the xsl:param
declaration; if you supply anything else, then it must
(after Java to XPath conversion) be of exactly the required type, without any conversion.
Saxon now supports XQuery. Details of how to use XQuery are provided at using-xquery.html, and information about the conformance to current working drafts is in conformance.html.
The doc
function is implemented.
The trace
function has been changed so that it is never evaluated at compile time,
even if both arguments are constants.
The optimization of count($x)=0
as empty($x)
was working only when the
stylesheet specifies version="2.0"
. It now works with version="1.0"
also.
Text-only temporary trees are used in a wider range of circumstances than before. This data structure
is used when it is known statically that a temporary tree will consist of a single text node; it is a
lot more efficient than a general-purpose temporary tree. It is now used in cases where the content
of the variable invokes xsl:text
and xsl:value-of
including calls that are
within xsl:for-each
, xsl:choose
, xsl:if
,
xsl:sequence
and xsl:analyze-string
, and also where it
uses xsl:call-template
, provided that
all the subordinate instructions generate text nodes. This has been done
in a generalized way which will eventually lead to static type inferencing working at the XSLT level
in the same way as it currently works at the XPath level. For xsl:template
, the type
of the results is inferred from the as
attribute if present, or from the contents of
the template otherwise. {not v. thoroughly tested!}
Range variables (that is, variables declared in an XPath expression (for, some, every) are now stored on the local stack frame in the Bindery rather than directly in the XPathContext object. This simplifies the machinery for handling variables and allows instructions and expressions to be treated more interchangeably.
The Expression
class has been refactored. The original class net.sf.saxon.expr.Expression
is now an interface. The various expressions are now structured under ComputedExpression
for
"true" XPath expressions, net.sf.saxon.value.Value
for constant values, and InstructionExpr
for instructions (such as xsl:element
) that act as expressions when used from XQuery.
The utility methods, including the make
factory method, are now in
net.sf.saxon.expr.ExpressionTool
.
When the dependencies of a ComputedExpression
are determined, the information is now saved
with the expression rather than being recalculated whenever it is needed. For complex expressions this
calculation can be quite complex, and there are still some cases where it is being done at run-time.
Path expressions now use the standard type-checking machinery to check that both arguments of "/" are node-sets. This means that in some cases an error in this area will now be detected statically; and it means that if the expression is found statically to be safe, no run-time checking is done.
I have changed the way delayed evaluation is done: when an expression is evaluated lazily, a Closure object is created as a surrogate for the value. This now contains the expression itself together with all the context information that the expression needs. The separate SavedContext object is no longer used. The Closure is evaluated using the ordinary XPathContext object, which now holds a reference to the local stack frame. With delayed evaluation, this "stack frame" is not actually on the stack at all, it is in the heap, so it survives if the Closure is returned from a function call.
I have reverted to the principle used prior to Saxon 7.x, that lazy evaluation is used only for expressions that are expected to return a (non-singleton) sequence. However, the classification of such expressions is now much more accurate. The reason for this policy is that delaying evaluation of singleton expressions is usually not beneficial - it saves no memory, and incurs a cost for saving and restoring the context. Also, lazy evaluation is not used for expressions that have unusual context dependencies, for example those that depend on current(), position(), last(), or current-group(). This eliminates the problem of saving these values and ensuring that they are referenced correctly during the delayed evaluation.
The delayed evaluation code now evaluates the underlying expression at most once, thus ensuring that
it never takes longer than direct evaluation. At the same time, if only the first item in the sequence is
used then only the first item will be read. In a construct such as
if (exists($x)) then $x else "nothing"
, the first reference to $x primes the iterator, and saves
anything it reads in a buffer (called the "reservoir") within the Closure object. The second reference to $x
starts by reading what it can from the reservoir, and if it needs more, it picks up iterating the underlying
expression where the first evaluation left off. Once some user of the variable has accessed all the items
in the underlying expression, the reservoir contains all the values needed and subsequent evaluations read
the value from there.
Certain instructions, specifically those that are used in for XQuery as well as XSLT processing, now act as expressions as well as instructions. There are two modes of evaluating these instructions: the process() method causes the instruction to write its output to the current Receiver, while the evaluateItem() and iterate() methods return the results in the same way as for any other expression.
To support this mechanism, the process() method now takes an XPathContext as its argument, instead of the Controller. This is because in XQuery, the XPath context needs to be passed unchanged by an element constructor to its child expressions.
Thanks to Gunther Schadow for these changes.
If an extension function returns null, this is now mapped to a zero-length sequence rather than to an external object that wraps null. This prevents some run-time type failures.
Exceptions thrown by an extension function are now wrapped in the XPathException thrown by the calling XPath expression, and hence in the TransformerException thrown by the transformation as a whole.
Public fields in Java classes are now accessible as zero-argument functions, for example the field
Double.MIN_VALUE
is accessible as Double:MIN_VALUE()
with the namespace prefix
bound as xmlns:Double="java:java.lang.Double"
. Non-static fields can be accessed by including
the object instance as the first argument. It is not possible (and rarely necessary or desirable!) to
modify public fields without use of a setter method. {saxon74}
The extension function saxon:is-null
(which was incorrectly documented as
saxon:if-null
) is now redundant, and is dropped.
The undocumented saxon:trace
function is dropped: use fn:trace
instead.
A couple of bugs in tail recursion have been fixed:
736802Tail recursive calls are sometimes not executed
A couple of cases have been reported where stylesheet errors cause a crash, these have been fixed.
I have recently installed JDK 1.4.1: this has revealed a couple of problems/inconstencies. The new JDK release appears to fix some problems with regular expression handling and with use of collations. This means that some of my tests produce subtly different results. In one or two cases the new results are clearly wrong, which means that my code was relying on the incorrect JDK 1.4.0 behavior. I have made adjustments where it seems appropriate, but particularly with collations, it is not always obvious what the right answer is.
It is now possible to have two or more stylesheet functions with the same name, provided they have the same arity (number of parameters). An error is reported if two functions have the same name, arity, and import precedence. {func27, func901err}
Added support for xsl:template mode="#all"
. {modes30-34, inimode002}
Basic support for the functions format-date
, format-time
,
and format-dateTime
is provided. The xsl:date-format
declaration
is not implemented, and the third argument of the function is ignored. Not all formatting
options are supported, and timezones are not handled properly yet. {date67}
I have done a lot of work on the JavaDoc documentation of key interfaces and classes, and some general code cleaning up (for example, removal of redundant methods).
The classes representing character encodings (in package net.sf.saxon.charcode
)
are now singleton classes; they only ever have one instance.
The following bugs are fixed in this release:
687946
An internal error may occur when the key
function is used on the right-hand-side of the
/
operator in a path expression.
689934 A ClassCastException occurs when comparing two values that are not comparable according to XPath 2.0 rules, for example string and integer.
690736 An IllegalStateException may occur when a sequence-valued variable is promoted by the optimizer to move it outside a predicate. The specific message is: java.lang.IllegalStateException: evaluateItem called on non-singleton variable reference at net.sf.saxon.expr.VariableReference.evaluateItem(VariableReference.java:202)
700837
An expression that uses a range variable (for example, a for
or some
expression)
cannot be used within a predicate in an XSLT pattern.
706935
A NullPointerException occurs when processing an expression of the form //x[$var]
or
.//x[$var]
.
708789 When two or more items have sort keys that evaluate to the empty sequence, the resulting sort order is incorrect.
708998
Incorrect recovery action when xsl:copy-of
generates a non-text node in the value of an
attribute, comment, or processing instruction.
709347 An empty sequence is not being converted to an empty string when calling a function in backwards compatibility mode.
710093
The value of position()
is calculated wrongly when navigating an axis on a JDOM tree.
721687
Crash in XPathEvaluator
(Java XPath API) due to incorrect generation of type-checking code.
722537 Saxon crashes if an attribute is marked as an ID but is not a valid ID, which can happen when using a non-validating parser.
I have finally dropped support for the old Java-only event-driven API. This was starting to interfere with the ability to optimize XSLT processing. The XPath interfaces remain available. Indeed, all the internal APIs remain available, but I am no longer trying to keep them as simple or as stable as is necessary for a supported external API. There were serious bugs in the ShowBooks.java sample application in Saxon 7.4 that somehow didn't show up in testing; this sample application has now been dropped.
I have also finally been forced to drop preview mode. It no longer works because the optimizer is becoming too clever. The optimizer uses lazy evaluation of expressions, which relies on the fact that the source document is immutable; preview mode violates this assumption. The correct way to handle this requirement is to write a document splitter as a SAX filter, breaking the source document into small pieces and invoking one transformation for each piece.
The deprecated extension functions get-user-data()
and set-user-data()
are no longer documented, though they have not yet been deleted from the product.
The required="yes|no"
attribute on xsl:param
is implemented. Currently,
failure to supply a required parameter is a dynamic error, it is never detected statically.
{ntmp01, ntmp901err}
The xsl:next-match
instruction is implemented. {cnfr24-27}
The error that occurs when the name
attribute of xsl:element
or
xsl:attribute
contains an undeclared prefix (in the absence of the namespace
attribute) is now recoverable. This brings it into line with the handling of other errors in this
value. Note however that if the name is known statically then the error is reported at compile time
and is fatal.
The error that occurs when a namespace or attribute node is written using xsl:copy-of
,
when there is no open element start tag, is now recoverable. This brings it into line with other
instructions such as xsl:attribute
and xsl:copy
. {copy62}
Implemented the new facility to allow construction of sequences in XSLT, when a variable
binding element has content and an as
attribute.
Added the as
attribute to xsl:function
. {func20}
Implemented the xsl:sequence
instruction, including the as
attribute
which checks the type of the returned sequence and performs any necessary (and permitted) conversions. {seqNNN}
The xsl:result
element is withdrawn. It can always be replaced by xsl:sequence
.
The as
attribute, which denotes the return type of the function, should preferably be moved to the
xsl:function
element.
Parentless attribute, text, comment, processing-instruction, and namespace nodes are implemented.
They are probably a little fragile - some operations on such nodes (e.g. xsl:number
,
xsl:apply-templates
)
have not been tested. The new rules for match patterns with parentless nodes have not been implemented:
it's probably best to avoid using apply-templates
on such nodes for the moment. {seqNNN}
Some instructions, e.g. xsl:value-of
, incorrectly generate multiple text nodes,
some other instructions may pre-merge the text nodes.
Handling of document nodes within the constructed sequence is probably not yet correct.
The separator
attribute of xsl:copy-of
is withdrawn.
The 2 May 2003 WD changes the syntax for attaching type annotations to nodes in a result tree. These facilities are only partially implemented in Saxon, and no new functionality is provided in this release, but the existing functionality has been converted to use the new syntax. Specifically:
The attribute copy-type-annotations="yes|no"
on xsl:copy-of
is replaced by
validation="preserve|strip"
. The default is "preserve".
(The other options, strict
and lax
, are not implemented.)
The [xsl:]type-annotation
attribute of xsl:element
, xsl:attribute
,
and literal result elements is renamed [xsl:]type
; it is still restricted to be a built-in
simple type such as xs:integer
or xs:date
.
The type-information
attribute of xsl:variable
and xsl:param
is withdrawn.
In backwards compatibility mode, the November 2002 XPath 2.0 draft specification currently says (if you read it carefully) that for a function expecting an integer, the supplied value should be cast to a double. Saxon 7.4 does this and it fails. This is a bug in the spec, which is fixed in the May 2003 draft; it has been fixed in Saxon 7.5. {type036, type037}
The changes that started in 7.3 to replace the Emitter
class with the Receiver
interface have now been completed. The only remaining classes that are Emitters are those that actually
write to a Writer
or OutputStream
- that is, the XMLEmitter
,
HTMLEmitter
, etc. All other classes in the
pipeline are now Receiver
s, many of them being defined as subclasses of ProxyReceiver
which
is essentially a filter in the pipeline. The output properties are now notified to each serialization
filter as it is created, rather than being passed down the pipeline.
Identity transformations now copy the source to the result via the Receiver
interface, not via SAX2
events as before. Two new classes DocumentSender
and DOMSender
have been introduced to generate Receiver
events from a Saxon DocumentInfo or a DOM Document respectively.
The saxon:doctype
extension has been completely rewritten. It no longer uses a custom
serializer, but instead generates the DTD within the code of the instruction itself, outputting the
result using the standard serializer with disable-output-escaping. There may be changes in edge cases,
for example the handling of namespaces within the expansion text of entities.
The standard URIResolver
has been changed to use the JDK 1.4 class java.net.URI
in place of the old java.net.URL
. This gives stricter adherence to the rules for URI handling
and appears to handle a wider range of URI formats. One effect is that (for reasons that I do not fully
understand) it is now possible to use Microsoft UNC filenames of the form \\server\share\dir\file.xml
from the Saxon command line. Another effect of the change is that URIs are checked more rigorously: for
example, a URI that uses backslashes instead of forward slashes is now rejected.
The OutputURIResolver
is now used for resolving the principal output document (including the filename
given on the command line) as well as secondary result documents. This eliminates a lot of duplicated code.
The standard OutputURIResolver
has
been enhanced to be capable of supporting URI schemes other than the file
scheme; however, I haven't
identified any schemes that support output using the standard JDK 1.4 libraries. The file
URI scheme
now maps URIs to filenames using standard methods provided in JDK 1.4, rather than through ad-hoc
parsing of the URI: this means that invalid URIs such as file:/c:\temp.xml
will be rejected
at this release even though they were accepted by earlier releases.
These changes can cause difficulties dealing with filenames that contain spaces. For example, the JAXP
classes StreamSource and StreamResult, when given a File as input, do not apply URI escaping to the
filename. This means that the "System Id" they contain is not guaranteed to be a valid URI, for example
it may contain spaces. Saxon may therefore report an Illegal URI when using the system identifier as
the base for resolving other URIs. To prevent this problem, given a file java.io.File fname
,
use the constructor new StreamResult(fname.toURI().toString())
rather than
new StreamResult(fname)
.
Comments of the form (: ... :)
are now allowed in XPath expressions. Comments may
be nested, and may appear anywhere that whitespace separators are allowed.
An expression of the form 1,2,3
can now be used as a top-level XPath expression, it does
not need to be enclosed in parentheses.
The "keyword" expressions (those starting with for
, some
, every
,
or if
) now have the lowest precedence; they cannot be nested within other expressions
except by using parentheses. The only exception is the "," operator, which has lower precedence than
these keywords.
The syntax for cast as
and treat as
has changed. These are now infix operators,
for example ($x treat as xs:integer)
or ('2003-02-20' cast as xs:date)
. Note also
that there are changes in precedence for these operators and for instance of
and
castable as
.
The union operator ('|') once again has higher precedence than unary minus, as it had at XPath 1.0.
So (- $a | $b)
parses as ( - ($a|$b) )
.
The new SequenceType syntax element(*, T)
replaces element of type T
. Since
only built-in atomic types are recognized, the first component must be "*" at present. This syntax
is not yet available in path expressions or patterns. The syntax document-node(element(X,Y))
is not yet recognized.
The types node
and item
must now be written as node()
and
item()
.
The built-in types dayTimeDuration
, yearMonthDuration
,
untypedAtomic
and anyAtomicType
are now
in the namespace http://www.w3.org/2003/05/xpath-datatypes
whose conventional prefix
is xdt
. They were previously placed in the xs
namespace.
Casting string to untypedAtomic is now permitted {type038}.
The URI for the codepoint collation is changed to
http://www.w3.org/2003/05/xpath-functions/collation/codepoint
.
All functions that take an optional collation argument now take it as an instance of xs:string,
not xs:anyURI as previously. The default-collation()
functions also returns an xs:string,
although it is still specified as returning an xs:anyURI.
The namespace name supplied to the expanded-QName
function is now an xs:string rather
than an xs:anyURI, as is the namespace name returned from get-namespace-from-QName
.
The base-uri
function with no arguments is now supported: it returns the base URI
from the static context (effectively, the base URI of the stylesheet). Both forms of the base-uri
function now return xs:string rather than xs:anyURI{reluri12-13}
The resolve-uri
function is supported, with one or two arguments. Both arguments
and the results are URIs provided in the form of strings. {reluri14-15, reluri901err}
The document-uri
function is supported, though not strictly according to the spec.
It is defined only for document nodes (as specified in the data model), and the URI returned is
not guaranteed to be absolute, and is not guaranteed to be capable of retrieving the document using
the document
function (for example, a value is returned for a temporary tree).
The functions exactly-one
, one-or-more
, zero-or-one
are
implemented. Because Saxon doesn't do pessimistic static type checking, these functions are never
actually needed, but they enable interoperability with systems that do such type checking. {type040, type901-904err}
The first argument of the root
function may now be omitted, it defaults to
the context node. {axes056}
The trace
function is implemented.
The Saxon implementation outputs the value of each item in a sequence as it is evaluated (except
when the sequence is empty, in which case it outputs "empty sequence" at the start). Atomic values are
output by converting them to a string, nodes by calling getPath() to generate a path expression to the node.
With complex expressions the order of evaluation may be rather different from the expected order. The
trace output is directed to System.err
, this may be redirected by using
2>log.txt
on the command line. {ver16, ver17}
The sequence-node-equal
function is renamed sequence-node-identical
{expr80}
The functionality of sequence-deep-equal
and deep-equal
has been combined into
a single function called deep-equal
. {expr81, expr82}
The round-half-to-even
function is implemented. Note that for doubles and floats,
numbers ending in .5 do not always round as expected, because the true value may be slighly above
or below the decimal equivalent. {math-two 10-14}
The substring
function no longer accepts an empty sequence for its second or third
arguments. (This is technically incompatible with XPath 1.0, though the effect of supplying an
empty node-set in XPath 1.0 was very obscure.)
The signature of the subsequence
function now accepts a double for the second and
third argument (which means that it also accepts an untypedAtomic value). However, if the value is
not a whole number, the rounding does not necessarily follow the rules in the XPath specification.
{expr72}
The function resolve-QName
is implemented. {type041, type[905-908]err}
Added support for the "i" flag in functions using regular expressions. This flag requests case-insensitive matching. {regex16}
Added stricter checking of the contents of the replacement string in the replace
function. {regex901err - regex904err}
The signatures of matches
, replace
, and tokenize
are updated
to match the latest specification. (Only the first argument is now allowed to be an empty sequence.)
The functions dayTimeDuration-from-seconds
and yearMonthDuration-from-months
are no longer defined in the core specification; they have been moved to the Saxon namespace and remain
available as extension functions.
The unparsed-text
function has changed, it now takes a single URI and returns a single
string, rather than processing multiple URIs in a single call. This anticipates a change to the XSLT
specification.
The format-number
function has been rewritten according to the XSLT 2.0 specification.
Since the old specification was based on JDK 1.1, which was underspecified, this causes some minor
incompatibilities. If needed, the old version of the function is still available under the name
format-number-1.0()
. {numberformat001-nnn}
The insert
function has been replaced by insert-before
, which inserts
the new sequence before the given position, not after it. {expr71}
Extension functions that declare an argument type of java.lang.CharSequence
can now
be called in the same way as extension functions that declare the argument type as java.lang.String
.
Thanks to Gunther Schadow [gunther@aurora.regenstrief.org] for identifying this requirement and the changes
needed.
The net.sf.saxon.Transform
command line now allows "-" as either the source file name
or the stylesheet file name, with the meaning that the source file or stylesheet is read from standard
input. Note that this means it will have no system ID, and therefore no base URI, so relative URLs
contained within the document cannot be resolved. Thanks to Gunther Schadow [gunther@aurora.regenstrief.org]
for this enhancement.
It is now possible to specify output parameters on the command line. These are entered in the same
way as stylesheet parameters, but with a "!" as the first character of the name. For example,
!indent=yes
switches indentation on, and !{http://saxon.sf.net/}indent-spaces=1
sets the indentation level to one. A value specified on the command line overrides any value specified
in an xsl:output
declaration in the stylesheet.
A new TraceListener is available, called TimedTraceListener
. This can
be activated using the -TP flag on the command line. A stylesheet is available for analysing
the resulting execution trace. For details, see Performance Analysis.
The command line option -im modename
can be used to specify the initial mode in which
the transformation starts. The modename may use the "{uri}local-name" syntax if it is namespace-qualified.
An initial mode can also be selected using the setInitialMode()
method on the Controller.
I have tested the JDOM interface more thoroughly at this release than previously, and this has revealed some problems. I have fixed the most serious of these. Most of the failed tests relate to the use of the namespace axis, and since the namespace axis is not mandatory in XPath 2.0, I have withdrawn support for this axis with the JDOM interface.
Shortly before Saxon 7.5 was finished, JDOM beta 9 became available (the first new release for a year). I tested Saxon 7.5 with the new release without problems.
I have changed the mapping for the Java types char
and Character
. If these
types are returned from an extension function, or supplied as a stylesheet parameter, they are now
mapped to an XPath string. Previously they were returned as an integer.
Revised the code for type checking and conversion in GeneralComparisons, in particular, the rule that in backwards compatibility mode, an argument is converted to a double if the other argument is numeric. {bool73, bool74}
There are four functions whose result depends implicitly on the current document: key
,
id
, unparsed-entity-uri
, and unparsed-entity-public-id
. These
functions are now statically rewritten to call an equivalent internally-defined function that takes "/" as
an additional explicit argument: for example id($x)
is rewritten as id+($x,/)
.
This has allowed the removal of the special code needed to handle the
fact that these functions have an implicit dependency on the context. (This is designed as a step
on the way to elimination of expression rewriting during reduction).
The changes made in 7.4 to extract constant sub-expressions out of a loop had an adverse side-effect in forcing path expressions to be sorted into document order unnecessarily. This has been fixed.
The decision whether a sort is necessary to deliver the results of a path expression in document
order is now made at compile time, and is reported by saxon:explain
. The phrase "naturally sorted"
means that the path expression delivers its results in document order; the phrase "requiring sort" means
that unless the containing expression asks for the results in arbitrary order, a sort will be performed
to get them in document order.
Some additional cases have been identified where a sort into document order can be avoided.
The main such case is path expressions starting with a variable reference, for example $x/a/b/c
.
These are increasingly common within for
expressions, and within stylesheet functions.
The system now maintains enough static information about the expression to which the variable is bound to
eliminate the sort in many cases. Note that with expressions used inside a stylesheet function, where
$x
is a parameter, this works only when the parameter is given a type that disallows multiple
nodes. Sorting is also avoided in a path expression that starts with the
document()
function. A performance bug that caused the results of the key()
function to be sorted unnecessarily has been fixed.
The expression rewriting in 7.4 sometimes introduced an unnecessary redundant range variable (in the
saxon:explain output this appears as let $zz:r1 := $zz:r0
). In most cases this has been eliminated.
Apart from a small intrinsic cost saving, this also enables some further expression optimization which
was previously inhibited by the extra complexity of the expression.
The mechanism previously used for lazy evaluation has been changed. In previous releases, the
reduce()
method was called to create a copy of the expression, in which context-dependent
subexpressions (such as variable references, or the current()
function) were replaced by their values.
This copying was expensive, in both time and memory. The strategy at this release is that a lazily-evaluated
value is represented by a Closure, which consists of the original expression (unchanged), together with
a SavedContext
, which holds all the values of context variables.
The changes to lazy evaluation revealed some problems with saxon:preview
. The problems
are not actually new, they were exposed by the changes. I eventually decided there was no alternative
to withdrawing this facility. I think the time is coming near when saxon:assign
may also have to go: these "features" are starting to interfere too much with optimization.
The changes can also cause problems with extension functions that have side-effects, by changing
the order of execution of different instructions. These problems can generally be fixed by writing
saxon:assignable="yes"
on each xsl:variable
element where the order of
execution is significant. However, it is best to avoid using extension functions with side effects
if at all possible.
The ContentEmitter
class now declares its methods with throws SAXException
, making life
easier for users who want to define a subclass.
Tail calls of named templates are now optimized not only for a directly recursive call, but for any
other call. So mutually-recursive templates now also benefit from this optimization, which means that
deep recursion is possible without running out of stack space. Tail calls using xsl:apply-templates
are also now optimized. The benefit applies only to the last call when a node-set is being processed,
but this is a typical use case with a call such as
<xsl:apply-templates select="following-sibling::*[1]"/>
.
However, tail calls of recursive stylesheet functions are not being optimized at the moment. The change in processing model (functions are now made up of instructions rather than expressions) made this too difficult. Reinstating this feature (introduced in 7.4) will require some kind of integration between the XSLT and XPath parts of the compiler.
The change in the way tail recursion is handled requires a change to the implementation of extension elements. A class that extends the Instruction class must now provide the methods processLeavingTail() instead of process(). No change is required other than declaring the return type as TailCall, and actually returning null.
Calls to xsl:function
are now notified to the TraceListener.
The theme for this release is strong typing. Saxon does not yet support user-defined types, and
has very limited facilities for supporting type annotations on nodes in the data model, but it does
support most of the built-in atomic types in XML Schema, such as string, integer, decimal, double,
date, dateTime, duration, QName, and anyURI. This release introduces the stronger rules for type
checking of operations applied to values of these types. If you want strong type checking, set
version="2.0"
on the stylesheet; if you are more concerned with backwards compatibility,
set version="1.0"
.
655725 -
Java ClassCastException occurs when executing a positional filter expression of the form
$x[position() > 1]
.
655948 -
Orphaned namespace declarations can appear when serializing a result tree using method="text"
.
655950 -
The -a
option on the command line does not work.
656857 -
The xsl:analyze-string
instruction does not work correctly when no match is found.
659064 -
A ClassCastException can occur when an extension function returns a value of class java.util.List
.
(The clearance for this introduces a new facility: an extension function can now return a Java List,
which will be converted to an XPath sequence, each item in the list being converted from a Java object
to an XPath value independently).
680755 -
With the XML output method, indentation should be suppressed when the result tree contains xml:space="preserve"
.
682979 -
An UnsupportedOperationException occurs when a stylesheet function is called from the return
clause of a for
expression.
684487 - Double values less than 0.001 are incorrectly converted to strings: a trailing zero is added (for example, "0.00030").
The following defect has not been fixed, and is carried over until a future release:
655730 - HTML elements in a non-null namespace are serialized as if they were in the null namespace; they should be serialized as XML.
The context item, position, and size are now unset on entry to a stylesheet function. This means that
if the function depends on the context item (or the document containing the context item),
then an explicit parameter must be defined. For example, if the function defines <xsl:result select="count(//*)"/>
,
this must be changed to <xsl:result select="count($x//*)"/>
, with $x
being
supplied as a parameter in the function call. If the function attempts to reference the context item,
position, or size, a dynamic error is reported.
This change gives optimization benefits in all cases where functions are NOT dependent on the context.
The value
attribute of <xsl:number>
is now handled as described in the
XSLT 2.0 specification: the value may be a sequence, and the items in the sequence are converted to integers
using the casting rules. A recoverable error occurs if casting to an integer is not possible, or
if the value is not positive. {numb07, numb20}
A side-effect of this change is that a user-written Numberer
must be changed to handle a long
argument rather than an int
as previously.
The code for assignment of variables and parameters has been changed to use the argument conversion
rules defined in XPath (without backwards compatibility mode). This is a weaker conversion than previously:
essentially, the select
expression must deliver a value of the correct type, it will not be converted
to the required type except in the case where the supplied value is an untyped node. This means, for example,
that if the required type is xs:integer
, you can supply an attribute node, but you cannot supply
a string or a double.
If a cast to the required type is wanted, it must now be written explicitly in the select expression. The error
will be reported statically if the expression could never yield a value of the right type, and will be reported
dynamically otherwise.
In a pattern starting key('k', $x)/...
, the variable $x
can now be multi-valued;
the pattern matches if it matches any of the values. {idky30}
The XSLT 1.0 rule that xsl:text
elements may not contain child elements has been
reinstated. This change anticipates changes to the XSLT 2.0 specification.
In limited circumstances, stylesheet functions (xsl:function
) now optimise tail-recursion.
The circumstances are that the select
expression of the xsl:result
instruction
must contain a call on the same function in the then
or else
part of a
conditional expression (which may be nested in further conditional expressions). It may require a little
care to write functions to exploit this.
The attribute override=yes|no
on xsl:function
is implemented.
This determines whether a user-written
stylesheet function should override a vendor-supplied or user-written Java function. The default is "yes".
{func19, func21}
This involved taking the function binding logic out of the ExpressionParser and putting it all in the static context. A simpler version of the routine is provided for StandaloneContext. The code has been structured so it can also test for arity (or even static type of arguments) as part of the binding algorithm, but this is not yet exploited.
A side-effect of this change is that it is now possible to call Saxon and EXSLT extension functions
when using free-standing XPath expressions, that is, expressions executed using the XPath API from Java,
or using saxon:evaluate(). It is also possible to call user-supplied Java extension functions
provided the URI maps implicitly to a Java class name (i.e., not relying on saxon:script
).
{saxon36, 37}
Sorting using xsl:sort
: I have changed the way data types are specified (in anticipation
of changes to the W3C specification). The data-type
attribute may now take
values text
or number
only.
The values text
and number
convert the actual sort
key to xs:string
or xs:double
respectively before doing the comparison.
If the data-type
attribute is omitted there
is no conversion. Values are then compared as supplied; if they are not comparable, a run-time error occurs.
If you want to force conversion to a type such as xs:dateTime
, use a casting function within
the select
expression, for example <xsl:sort select="xs:date(@birth-date)"/>
.
There is a small risk of backwards incompatibility if your stylesheet computes a numeric sort key and doesn't specify
a data-type: previously it would have been sorted as a string, it will now be sorted as a number.
I added a check (really a bug fix for 7.3) that when an element is annotated with a simple type
such as xs:integer
, the element is not allowed to have attributes. The type annotation for an element
with attributes must always be a complex type.
The type conversion rules as defined in the XPath 2.0 working draft are now implemented. This means that the rules for passing arguments to functions and operators are now stricter: you can't simply assume that the supplied value will be cast to the required type.
The rules are stricter if the stylesheet specifies version="2.0"
than if it specifies
version="1.0"
. The version
attribute (or xsl:version
on a literal
result element) determines whether XPath 1.0 backwards compatibility mode is used for XPath expressions within
its scope. With backwards compatibility mode on, certain conversions that were permitted in XPath 1.0 (for example,
conversion of anything to a string or number, or extraction of the first item of a sequence) are still permitted
under XPath 2.0. With this mode off, these conversions must be done explicitly in the function call. The only
conversions that happen automatically under 2.0 are extraction of the value from a node: if the node is untyped
(which with Saxon 7.4 will almost invariably be the case), the untyped value is then cast to the type required
by the function signature. This means you can supply an untyped attribute as an operand to the "+" oeprator or
the round()
function, for example, but you cannot supply a string.
Division of two decimals (and also the mod
operator) now produces a decimal. Previously it produced
a double. The scale of the resulting decimal (the number of digits after the decimal point) is equal to
the sum of the scales of the two decimals, plus 6. So, for example, 10.0 div 3 is 3.333333. There have
been some other refinements to the xs:decimal
implementation. Insignificant trailing zeros are always
discarded. Conversion of a decimal to a string uses an integer representation if there is no fractional
part, for example the result of 10.0 div 5.0
is diplayed as 2
(which doesn't
match the current XPath specification).
Conversion of an empty sequence to a double now produces an empty sequence (not NaN as in XPath 1.0); conversion of any other non-numeric string to a double raises an error (rather than producing NaN). This follows the current XPath 2.0 specification.
I attempted to change the double-to-string conversion to fit the XPath 2.0 rules, but it hit numerous
compatibility problems so I have held back on this. But positive and negative infinity are now represented
as INF
and -INF
(previously they were Infinity
and -Infinity
).
The strings INF
and -INF
are also recognized when casting a string to a double or
float; XPath 1.0 had no way of constructing these values except by using a construct such as
(1 div 0)
. {math95, math96}
The standard functions have been enhanced, where the spec requires it, to return an empty sequence if one of their arguments is an empty sequence.
The max()
and min()
functions now handle any comparable type,
returning the correct data type (for example, if given a set of integers, they return
an integer). The default (for untyped nodes) is string comparison. A collation can be specified as the
second argument; if not specified, the default collation is used. {expr56, expr57, sort20, sort21, date065, error220}
The implementation of deep-equal()
, sequence-deep-equal()
,
and sequence-node-equal()
has been revised to
conform to the current specifications.
The distinct-values()
function is now implemented, with an optional collation argument. {group017-019}
The functions floor()
, round()
, ceiling()
now return a value of the same type as the supplied
argument. {math87-90}
The index-of()
function now takes an optional collation argument. {expr87}
For functions that take a collation argument, such as compare()
, the default if no
collation is specified in the call, and no default <saxon:collation>
is supplied,
is to use code-point collation. This differs from previous releases, where the default was to use
a locale-dependent collation. For xsl:sort
, the default is still locale-dependent.
This decision is likely to be reviewed in future.
The arguments of sort()
are now reversed: the first argument is the sequence to be sorted, the
second is the name of the sort key specification.
Changed sum()
and avg()
to return the same type as supplied. The average of an empty sequence is
now (), not NaN. These functions are still confined to handling numeric values, they do not yet work
over other summable types such as durations. {math91, math92, expr55, error227}
Corrected a bug: conversion of a double to a float was returning a double!
The functions starts-with()
, contains()
, and ends-with()
now accept a collation name as an optional second argument. {str126-128}
The functions substring-before()
and substring-after()
now accept
a collation name as an optional second argument. {str129-130}
The saxon:distinct()
function, with a single argument, is dropped: the functionality is available
using either the XPath 2.0 distinct-values() function, or the EXSLT set:distinct()
function. The
two-argument form (which takes an expression as the second argument) remains.
The dynamic expression supplied to saxon:expression
and saxon:evaluate
can now
contain references to the variables $p1
, $p2
, ... $p9
. The values of
these variables can be supplied when the expression is evaluated using saxon:eval
or saxon:evaluate
respectively. The expression can also contain calls to Java extension functions bound using the implicit mapping
of Java classes to namespaces, and to Saxon and EXSLT extension functions. For more details see
extensions.html.
The function saxon:get-user-data()
has changed to return an empty sequence rather than a zero-length
string if no data exists. This is to prevent type-checking problems when the expected value is not a
string. {saxon06}
The error()
function (with optional argument) is implemented. If the argument is specified, its
string-value is used as the error message. {error223-224}
The node-name()
function is implemented. It returns a value of type xs:QName
.
The functions get-in-scope-namespaces()
and get-namespace-uri-for-prefix()
are
implemented. {nspc45-46}
The unparsed-entity-public-id()
function (defined in XSLT 2.0) is implemented. This required
a minor change to the DocumentInfo
interface implemented by the tree model. {expr88}
The unordered()
function is implemented. This returns the results of a sequence in
implementation-defined order. In practice the only important case where it has any effect in the Saxon
implementation is where the sequence supplied as argument is a Step using a reverse axis: for example,
unordered(ancestor::*) returns the ancestors in reverse document order. But applications should
not rely on the actual order; the function is intended to be used by applications that do not care about
the order of the results. {axes60}
A simple implementation of the input()
function is available. If the parameter
{http://saxon.sf.net/}input
has been supplied to the transformation, it returns the value
of this parameter. This must be a node sequence &emdash; which means it cannot be supplied from the
command line. If no such parameter has been supplied, it returns the root of the principal source
document (the document containing the node that was matched on entry to the transformation).
{Limited testing only: mdocs09}
The default-collation()
function is implemented: it returns the URI of the default
collation if specified, or the URI of the code-point collation otherwise.
The component extraction functions for durations are now available only on the two subtypes
yearMonthDuration
and dayTimeDuration
, and are named accordingly:
for example get-years-from-yearMonthDuration
and get-hours-from-dayTimeDuration
. (But other operations remain available on durations even though
not specified in the current working draft, for example equality comparison on durations). {date055-058}
Casts and constructor functions, when converting from a string to another type, now apply the appropriate whitespace normalization to the supplied value, as defined in the whitespace facet for the target data type. This means, for example, that an ID value can have leading and trailing spaces, which are ignored. {type035}
A new attribute saxon:explain
is available on any instruction in the stylesheet. The permitted
values are "yes" and "no". If the value is yes, then at compile-time, Saxon outputs an analysis of all
XPath expressions appearing in attributes of that instruction. The analysis includes the static type
of the expression, and a representation of the optimized expression tree. For some examples, see
extensions.html.
A new command line option is available. The -TJ
option traces the binding of external
calls to Java functions. It is useful when analyzing why Saxon fails to find a Java method to match
an extension function call in the stylesheet, or why it chooses one method over another when several
are available. The option is also available programmatically via the TRACE_EXTERNAL_FUNCTIONS
attribute
in the TransformerFactory
.
The URI http://www.w3.org/2002/11/query-operators/collation/codepoint
is now recognized
as the name of the code-point collation; if this URI is specified in calls to sorting or comparison
operations, strings will be compared according to their Unicode code-points. Note that this URI
is likely to change in subsequent versions of the XPath working drafts.
There has been a change to the mechanism whereby a ContentHandler
that is nominated
to handle the result tree can indicate
that it is prepared to handle output that is well-balanced, but not well-formed (for example, if it contains
more than one element node as a child of the document root).
A new attribute saxon:require-well-formed
is available on xsl:output
, with
values "yes" or "no". The default is "no". If the value is set to "yes", and a user-written
ContentHandler
is supplied to receive the results of the transformation, then Saxon will report an
error rather than sending a non-well-formed stream of SAX events to the ContentHandler
.
{saxon72, error231}
The XML output method now outputs a tab character appearing in an attribute value as 	, to prevent it being normalized when the document is re-parsed.{type035}
The XPath API has been extended with a method that allows the sort order for the results of an expression
to be specified. Previously, there was no way of sorting sequences except in an XSLT context, or by use
of rather complex internal Saxon classes. Expressions generated using the XPath API can also now be used
directly in conjunction with the applyTemplates
mechanism in the Controller
, for
use by applications doing rule-based processing from Java. The getExpression
method of
XPathExpression
, which provided an escape from the packaged XPath API into the internal
Saxon interfaces, has been replaced by rawIterator()
, which fulfils the same purpose.
A Java extension function can now return a java.util.List
object to represent a sequence.
Each item in the list is converted to an XPath value (node or atomic value) as if it were returned from
a separate function call: for example, a List containing three Java Integers is converted to an XPath
sequence of three xs:int
values. {func25}
The interface that handles the setting of variables accessed by the XPath engine from its environment
has been generalized to remove the assumption that the variables are defined in XSLT. This means that
variables can now be used in the freestanding XPath API, as well as in saxon:expression
and
saxon:evaluate
.
I have removed the restriction that a URIResolver for the URI contained in the href
pseudo-attribute of the <?xml-stylesheet?>
processing instruction must return a SAXSource.
It can now return any kind of Source. This change has been regression tested only. The (existing) code for
creating a composite stylesheet when there are several <?xml-stylesheet?>
processing instructions,
(as specified in the JAXP interface definition) is not tested by any of my standard tests, but I have
left it in. From an inspection of the code, I don't think it will work if the URIs are relative.
These changes should not affect users unless you exploit internal interfaces within Saxon.
Parameters to stylesheet functions are now passed by position (in an array of values), not by name.
Internally, there has been a change to the processing of literal result elements. XPath expressions contained within attribute value templates on such an element are now processed during the first (prepareAttributes) compilation phase, as with other stylesheet instructions. Type checking happens during the second (validate) phase. A consequence of this change is that user-defined top-level elements are now represented by a different class, DataElement, to prevent their attributes being processed as AVTs.
Changes made in support of XPath type-checking include the following:
The general trend is towards doing more of the work at compile time. Where type conversions are necessary, or where it is determined statically that they might be necessary, then the conversions are compiled into the executable expression; if they are not necessary, they are not performed. Similarly, if dynamic type checking is necessary, then it is compiled into the expression; otherwise, it is not performed.
Function calls to standard functions are now compiled with knowledge of the signature of the function. The code generated is conditional on whether backwards compatible mode is enabled or not. If the supplied arguments are incompatible with the function signature (that is, if the call cannot possibly succeed) then a static type error is generated. Code to atomize nodes and perform other allowed conversions (e.g. numeric promotion) is compiled into the expression tree. If the supplied value cannot be statically guaranteed to be of the correct type, then type-checking code is generated in the expression tree.
The same logic is used for calls to stylesheet functions. In this case, backwards compatible mode is never used, which means there is no implicit conversion of arguments. Calls to stylesheet functions are now statically checked; this is done by means of a fixup process that allows for the fact that the function call can be parsed before the function declaration is encountered.
The same logic is used for evaluating keys.
Within the implementation of standard functions, arguments are now evaluated without any type conversion: any conversions that are performed are done by the function calling mechanism, using internal tables that represent the signatures of each function.
The internal Expression#evaluate()
method has been dropped.
All implementations and usages of this function have changed
to use evaluateItem()
or iterate()
(or in some cases, lazyEvaluate()
),
as appropriate.
The code for value comparisons and general comparisons has been split into a number separate classes.
These do stricter type checking of their arguments. The decision which algorithm to use (hash join, etc)
is now made at compile time, using static information about the types and cardinality of the arguments.
But the conversion of untypedAtomic values (which result from atomizing a node with no type annotation) to
a string or double (depending on the type of the other argument) is done dynamically.
In the final stages of testing I found a design problem in this area: neither the new code nor
the code in previous releases handled comparisons such as (U, U, U) = (1, 2, '3')
correctly, where U
is an untyped value. The problem here is that a mixture of string and numeric comparisons is required.
I fixed this for the time being by changing the code so it always does a naive nested-loop comparison. This doesn't
appear to have a noticable effect on performance in most cases: there will be some cases where
it is very inefficient, but these don't arise very often.
Other classes, notably the code for arithmetic expressions, also do stricter type checking.
The code for attribute value templates has been reorganized. The AttributeValueTemplate
class is now
used only at compile time, and it has therefore been moved to the style
package. It no longer acts as
a pseudo-XPath expression; instead, compiling the AVT generates a true XPath expression, including calls
to concat()
, string-join()
, and string()
where required.
These handle all necessary type conversions.
The Expression#evaluateAsString()
method no longer does conversion of the expression
result to a string; the
method should only be used where (a) the expression is statically known to return a string or (), and (b)
the returned value of () is treated as equivalent to "". In practice, this means that the use of the method
is now largely confined to the evaluation of attribute value templates. This method will probably
be phased out.
The code for xsl:value-of
has changed so it now compiles any code needed to convert the
supplied expression to a string (or, if the separator attribute is present, a sequence of strings)
The code for xsl:sort
has changed so that the sort key is converted to the required type
using the same rules as the rules for function arguments. Internally, a new class FixedSortKeyDefinition
is
introduced to represent a sort key definition that contains no context dependencies, that is, one in which
the values of all the parameters such as order, case-order, language, and data-type are known. Sometimes it is
possible to create this statically, sometimes (when AVTs are used) it cannot be created until the values of
variables are known.
Those Saxon extension functions that need special treatment at compile time (specifically,
saxon:evaluate
,
saxon:expression
, saxon:parse
, and saxon:serialize
),
are now treated in the same way as system functions.
The class SimpleValue
has been renamed AtomicValue
.
The method convert()
is now available only on the AtomicValue
class, it is not available
for all values as previously. This method implements the logic of the casting rules.
Expressions are now parsed in three stages: parsing, context-independent rewriting, and static type
analysis. The first stage is done by the ExpressionParser
class, the second by calling the simplify()
method on the resulting Expression object. The third stage is done by calling the typeCheck()
method
on the Expression object. In an XSLT context, type information for stylesheet variables and stylesheet
functions is added before the typeCheck()
method is called. The Expression.make()
call
only does the first two steps; applications that use this interface must be changed to call typeCheck()
as well. The XPath API in package net.sf.saxon.xpath
works unchanged.
Higher-order expressions, such as path expressions, filter expressions, and "for", "some", and "every" expressions, are now rewritten statically to promote any subexpressions that don't depend on the iteration variables. The effect is that such subexpressions are only evaluated once. This mechanism replaces the previous run-time optimisation based on the concept of expression reduction (at run-time, the expression was replaced with an expression in which the independent sub-expressions were replaced with their value). The new mechanism is done entirely at compile time and is therefore much more economical. Also it avoids doing trivial rewrites, that is, extracting constants and simple variable references.{opt001-004}
Run-time expression reduction is still used to eliminate context dependencies in an expression that is being evaluated lazily (always an expression that returns a sequence), and is being held as the value of a variable. When evaluation of such an expression is deferred, it is necessary to make a copy of all aspects of the context that it depends on, and this is done by rewriting the expression with a new expression in which all context variables are replaced with their values.
641793 -
<xsl:element name="{local-name()}"/>
fails (with the message "namespace prefix has not been declared").
641940 - No diagnostics for Saxon internal errors when using the Crimson parser
641948 -
NullPointerException when two xsl:strip-space
or xsl:preserve-space
declarations
name the same element
645190 -
<xsl:namespace>
rejects a zero-length string as the name, and fails to detect a conflict
with the namespace of the containing element
646844 -
The <saxon:query>
extension element throws a NullPointerException if the columns
attribute is omitted
The code for <xsl:number level="any">
has been optimized. The conditions that must apply
for this optimization are
that the count and from patterns must not contain any variable references; either the count pattern must
be specified, or the name and type of the context node must be statically obtainable, typically from
the match pattern of the containing template rule. Under these circumstances, Saxon will remember the
result of evaluating the instruction, and the next time it is evaluated, if the first node encountered
in its reverse scan of the document is the one that was most recently numbered, it will simply add one
to the remembered number.
Note: a similar optimization for <xsl:number>
with no attributes has been
in use for some time.
The numbering code also now passes down the required node name and kind to the axis iterator that is used to find the nodes being counted. This reduces the overhead of skipping non-matching nodes.
The code for navigating the parent and ancestor axes in the TinyTree implementation has been improved. The next pointer for the last sibling now points to the parent node; this is distinguished from a normal next pointer by virtue of the fact that it always points backwards in the node array. When a search is needed to find the parent, this is now done by reading the next-sibling pointer chain until the owner is found, which in general is faster than the previous technique, of scanning all preceding nodes until one is found whose depth is lower.
This was done after exploring a number of alternative approaches, none of which led to significant performance improvements. In particular, I tried various ways of remembering the parent node during a scan of the descendant axis, but in all cases the benefit achieved when the parent node was actually used was less than the extra cost of maintaining the information in the cases where it wasn't needed.
Note: I have not identified any circumstances in which the "standard" tree implementation out-performs the TinyTree. The "standard" implementation is retained, however, because it is used for stylesheets during compilation.
A small improvement has been made to the code for evaluating an attribute reference.
Various expression classes now contain their own implementations of the effectiveBooleanValue() method, avoiding the need to use the general-purpose logic in cases where the value is already known to be a singleton.
My main test case for these performance improvements was the stylesheet used to render the XSLT 2.0 specification. The execution time for this stylesheet improved from 16.4 seconds to 7.6 seconds. Improvements for other stylesheets are very unlikely to be as high as this. Another test case improved from 66.8 to 59.8 seconds, which is probably more typical.
References in curly braces identify the test cases used to test each new feature.
603928 - [position()!=last()] in a pattern fails.
607442 - unparsed-text() function fails.
608416 - command line processor calls System.exit().
616543 - xsl:param needn't come first in a template.
616548 - multithreading bug when using saxon:preview declaration.
617103 - ./EXPR doesn't sort results into document order.
620851 - precedence of conflicting xsl:namespace-alias declarations.
626277 - exlst:leading() and exslt:trailing with empty node-set arguments.
635433 - SQLInsert attempts commit() even when autocommit is set.
636661 - interaction of cdata-section-elements with disable-output-escaping.
637117 - creating two namespaces with same prefix and different URI.
637292 - xsl:for-each and xsl:for-each-group don't nullify the current template rule.
The xsl:principal-result-document
element is withdrawn.
Note, however, that the ability to NOT have a principal result tree is not yet available.
A principal output file will be created even if it is empty.
Attributes of xsl:output
can no longer be attribute value templates.
A new attribute has been added to xsl:output
: saxon:byte-order-mark="yes"
causes a byte order mark (hex FEFF) to be inserted at the start of the output file. This is most
useful with UTF-8 and UTF-16 encoding, as some text editors recognize it, but it is available for
use with any output method and any encoding. {outp70}
The saxon:omit-meta-tag
attribute in xsl:output
has been replaced with the
new (standard) include-content-type
attribute. Note that this works the other way around:
replace saxon:omit-meta-tag="yes"
by include-content-type="no"
.
{saxon47, outp71}
The new (standard) escape-uri-attributes
attribute in xsl:output
has
been implemented for the HTML output method. (URI escaping is not yet implemented for method="xhtml").
{outp71}
The built-in template rules now pass parameters through unchanged to the templates for their child
elements. This applies whether the rule is called because of xsl:apply-templates
or xsl:apply-imports
{cnfr21, cnfr22}.
The base URI of the root of a temporary tree is now taken from the base URI of the xsl:variable
element in the stylesheet. Previously it was taken from the system ID. There is a difference in the case
where xml:base is used. {not tested}
A variable that is never referenced will no longer be evaluated. This can cause problems if evaluation
of the variable has side-effects (e.g. by calling an extension function, or saxon:assign
). You
can force evaluation of the variable by setting saxon:assignable="yes"
.
The terminate
attribute of xsl:message
may now be an attribute value
template. {ver14}
The copy-namespaces
attribute of xsl:copy
and xsl:copy-of
is
now supported. {copy12-13}
The type
attribute of xsl:variable
, xsl:param
,
and xsl:result
is renamed as
.
The as
and collation
attributes of xsl:key
are now supported.
This allows indexing of nodes by numeric or date values, and matching using case-blind or accent-blind comparisons.
{idky25-29}.
The type-annotation
attribute of xsl:attribute
and xsl:element
and the xsl:type-annotation
of literal result elements are supported. For example, you
can now annotate attributes of elements on a temporary tree as type-annotation="xs:ID"
, and
then use the id()
function to find them, using an expression such as $tree/id('A001')
.
The actual value of the element or attribute must be valid according the the type given in the type annotation. Only built-in
schema-defined types are currently supported. Attribute types derived from a DTD will be recorded if they
are reported by the parser (but CDATA is treated as untyped, and the list types IDREFS, ENTITIES, and NMTOKENS
are not yet supported). Although the values must be valid according to their type, there are no checks on
uniqueness constraints (ID) or referential integrity constraints (IDREF). {schema001-4}
The type annotations are retained on the tree only if the attribute type-information="preserve"
is present. If the attribute is absent, or is set to none
, the type-annotation on any
elements or attributes in the tree will still be used to validate the content, but will not result
in any annotation of the nodes on the tree. The values strict
and lax
for this attribute are not yet implemented. {schema005-6}
The type-information
attribute is also available on xsl:result-document
.
It only affects the outcome if the result tree is captured using a user-written Receiver
in which
the annotations will be available. At present the type annotations are NOT retained if
the result is fed into another stylesheet using
saxon:next-in-chain
: this is because the chaining goes via a SAX2 ContentHandler
which cannot pass the type information through. {schema007, schema013}
The attribute copy-type-annotations
is available on xsl:copy-of
. The default
is "no", which means that type annotations are NOT copied from the source tree to the result tree.
{schema011-012}
The effect of xsl:namespace-alias
has been changed. Elements and attributes whose namespace
is changed by an xsl:namespace-alias
declaration will now take the prefix given in
the result-prefix
attribute, where possible. Previously they took the new namespace URI
but retained their original prefix. This was technically conformant with the specification, but untidy,
and it often led to the result document containing multiple declarations
of the same namespace URI. {nspc36-38}
Conflicting xsl:namespace-alias
declarations are now reported as a static error.
{error007}
The precedence of different expressions in the XPath grammar has been aligned with the August 2002
working draft. This meant making a few changes: range expressions (such as 1 to 10
) now bind more
tightly than conditional expressions; all comparison operators now have the same precedence, and consecutive
operators (as in a = b = c
) are not allowed; unary minus binds more tightly than union;
cast and treat expressions are no longer allowed as steps in a path expression. Saxon implements
the full XPath 2.0 grammar with the exceptions of the validate
expression and schema-related
aspects of the SequenceType
production.
Removed the ability to do a "mapping cast", that is, to cast a sequence as a sequence. This functionality went beyond the semantics of cast as defined in the XPath 2.0 specification. The argument and result of a cast must now be a singleton, and if the input is an empty sequence, the output is an empty sequence. The actual conversion rules still need some work to align them fully with the evolving XPath specification.
Implemented the escape-uri()
function. The '#' character is treated as a reserved character, in addition
to those listed in the specification. {expr85}
Implemented the item-at()
function, but with restrictions: if the subscript is out of range, it should
raise an error, but it currently returns the empty sequence. {pos65}
Implemented the data()
function. {schema008, 009, 010, 012}
The concept of "effective boolean value" has been implemented. This algorithm
is now used when converting any value to a boolean
in contexts such as conditional expressions, filter predicates, and the boolean()
function. It is
fully backwards compatible with XPath 1.0.
A different,
more restricted algorithm is used when casting values to booleans using a cast expression or the xs:boolean()
constructor: for strings in particular, the effective boolean value gives false for a zero-length string
and true for any other string, while xs:boolean()
(in line with W3C Schema) gives true for "1" or "true",
false for "0" or "false", and an error for any other string. {type034}
xs:boolean()
changes are not yet complete for supplied values other than string.
The algorithm for "atomize" is also available for all expressions, though at present it is used only for the argument of a cast. It is also simpler than the algorithm described in the specification because at present the typed value of a node is always the same as the string value.
Changed the EXSLT set:leading()
and set:trailing()
functions (as required by the spec) so that if the second
argument is empty, the first argument is returned. Changed saxon:before()
and saxon:after()
so they work the
same way. Previously, the empty node-set was returned. This change will be retrofitted to 6.5.x. There is a further
deviation from the spec: If no node in the second node-set is present in the first node-set, Saxon
returns all nodes before/after the first/last in the second node-set, whereas the spec requires it to return
an empty sequence. This would require a redesign, and it prevents a pipelined implementation, so I don't intend
to implement this change.
Implemented the string-to-codepoints()
and codepoints-to-string()
functions,
replacing saxon:string-to-unicode()
and saxon:unicode-to-string()
. {saxon68-69}
Implemented the string-join()
function. {str125}
Implemented the castable as
operator. {type030}
Implemented the types xs:anyURI
and xs:QName
, and the functions expanded-QName()
,
get-local-name-from-QName()
, get-namespace-from-QName()
{type031-33}
Implemented the SequenceType grammar for "attribute of type T" and "element of type T". T must be a built-in simple type. {schema002-004, 014; error009, 012}.
The second argument of saxon:serialize()
must now be known at compile-time. This is because details
of xsl:output declarations are not available at run-time unless they are actually referenced.
The results of the function-available()
and element-available()
functions may be inaccurate
if the argument is not known at compile-time. Specifically, only system-defined functions and
instructions are known at run-time. In practice, these functions are designed to perform compile-time
tests so this is very unlikely to be a problem. There is also some justification in that the only
functions that can be called dynamically (using saxon:evaluate()
) are system-defined functions.
As a result of the changes affecting stylesheet compilation, there are some new restrictions on
the extension function saxon:evaluate()
(and also saxon:expression()
). In particular, the dynamically
constructed expression can no longer reference any XSLT variables, and it cannot access any stylesheet
functions, Saxon extension functions, or XSLT-specific functions such as key()
and generate-id()
.
There has been a substantial change to the way stylesheets are "compiled". In previous releases, the compiled stylesheet was actually a standard tree representation of the source XML stylesheet, with annotations on the nodes to assist efficient execution. In this release, the tree representation of the stylesheet is discarded once compilation is complete, and a custom data structure is used to represent the executable stylesheet.
The compiled stylesheet may now be serialized (using Java serialization), enabling it to be saved on
disk, or transferred between machines - this is especially useful in an Enterprise Java Beans environment.
A new command java net.sf.saxon.Compile stylesheet output
is available to
compile a stylesheet, and the java net.sf.saxon.Transform
command has a new option -c
which causes the stylesheet parameter to be taken as a compiled stylesheet rather than a source
stylesheet. In fact, using compiled stylesheets from the command line does not give a great performance
advantage over recompiling them each time they are used, because the compilation time is dominated by
Java initialization; the benefits are more likely to be realized in
a high-throughput server-based environment, where it is now possible to use disk caching of stylesheets as
an alternative to in-memory caching.
These changes bring (or promise) a number of benefits:
xsl:fallback
is made entirely
at compile-time.The main drawback is that less of the static context is available during execution. This makes a number of things more difficult, or in some cases impossible:
saxon:evaluate
and
the saxon:allow-avt
attribute which allows dyanamic selection of a template in xsl:call-template
In general I expect that stylesheets will need to be recompiled whenever a new Saxon version is issued, though this may be avoidable the case of a bug-clearance release.
Stylesheet compilation is a little fragile at this release. It has proved difficult to test it
comprehensively. One known restriction is that stylesheets containing saxon:collation
declarations
cannot be compiled (because it uses Java classes that are not serializable). There may be other restrictions:
please let me know if you find any.
As part of this change, the stylesheet tree now uses a different NamePool from the source tree. This
NamePool is discarded as soon as compilation is complete. Names
used in XPath expressions, names of literal result elements and attributes, and names of keys, variables,
templates, and functions, are still registered in the NamePool for the source document, but the names
of XSLT elements and attributes (e.g. xsl:template
, select
) no longer appear.
This significantly reduces the size of the compiled version of a small stylesheet, and makes loading
of the compiled stylesheet correspondingly faster. It also means that names used in the source document
are less likely to encounter hashing conflicts in the NamePool, giving a small run-time speed-up.
There have been a number of changes to APIs that may affect users.
XSLTContext
object has been merged into net.sf.saxon.Controller
. This
class was exposed in the traditional Saxon Java event-handling API, and was also available for use
by extension functions, extension elements, and trace listeners. Extension functions that require
context information must now declare a first argument of class XPathContext
.net.sf.saxon.instruct.Instruction
. The SQL extension library has been
updated to show how the new scheme works.I have introduced a new interface, net.sf.saxon.event.Receiver
, which is intended to
replace the old Emitter
interface. This supports setting type
annotations on element and attribute nodes: it allows the type information to be carried with the
element and attribute events, and also allows various properties to be associated with each event, used
for disable-output-escaping and to indicate when validation has already been done so it does not get
done twice.
The classes that implement this interface are largely in package net.sf.saxon.event
, which
replaces the old net.sf.saxon.output
package.
The new interface largely replaces the Outputter interface, ending the artificial distinction between the Outputter and the Emitter, which was there historically because events were handled in a different order at the two interfaces.
The "sticky d-o-e" facility is not working in this release: that is, an error is reported when output is written to a temporary tree with disable-output-escaping="yes". The same happens if the final output of the stylesheet is written to a Saxon tree, for example when using a Saxon-created DOMResult, or when using stylesheet chaining. It is possible that "sticky d-o-e" will not be allowed in the final XSLT 2.0 specification, though at present there are open issues concerning this. {outp09, bug17}
The class hierarchy for XPath expressions (net.sf.saxon.expr.Expression) has been simplified. The two abstract classes SingleValueExpression and SequenceExpression have disappeared; their functionality has moved into the parent class, Expression, driven by the static cardinality of the expression as determined by the getCardinality() method. This allows greater re-use of classes such as BinaryExpression. There is potential for many expressions to be implemented as functions, allowing more use of generic code and table-driven static analysis.
The implementation of SequenceExpressions of the form (1,2,3)
has changed completely,
and is much simpler.
They are now handled by breaking them up into a tree of binary expressions, treating "," as a list
concatenation operator. {expr53, 54, 55, 86}
The implementation of FilterExpressions has been rewritten and simplified. Two different iterators are now used, a FilterIterator where every value needs to be tested, and a PositionIterator where the value is known statically to be numeric. This greatly simplifies the code. The way in which reverse axes are handled has also been simplified.
I want to move away from run-time expression reduction on filters to doing a static rewrite that pulls non-dependent subexpressions out of the predicate, but this has not yet been done.
The class XPathException
is now abstract. There are two concrete subclasses,
XPathException.Static
and XPathException.Dynamic
, used to distinguish static
from dynamic errors. (Other subtypes, for example XPathException.TypeError
may be
introduced in future. A dynamic error that occurs when an XPath expression is evaluated early (at
compile time) is now not reported until run-time, and is only reported if the expression is actually
evaluated.
I have decided to drop the integration with Apache's FOP processor. The API has changed yet again between FOP 0.20.3 and FOP 0.20.4. It is simply too much hassle to keep chasing a moving target, especially as the changes are not well documented and impossible to make without studying the FOP source code.
It is now possible to control the use of NamePools via the TransformerFactory. The call
factory.setAttribute(FeatureKeys.NAME_POOL, pool)
causes the specified namepool
to be used by all stylesheets that are compiled (using newTemplates()
) following
this call. Note: unless you really know what you are doing, it is safest to let Saxon manage
the namePools automatically.
The HTML output method now uses its own internal method for URI escaping, rather than relying on the utf8 encoding available in the Java IO library. {outp52, 57}
Support for SAX1 XML parsers is withdrawn. All mainstream parsers support SAX2, with the possible exception of James Clark's xp. Similarly, output will no longer be directed to a SAX1 DocumentHandler: you must supply a SAX2 ContentHandler instead. Saxon now compiles without any deprecation warnings.
Saxon 7.2 requires Java JDK 1.4
Saxon 7.2 requires JDK 1.4. This is primarily to support the use of regular expressions: Saxon now uses
the JDK 1.4 regular expression library to support xsl:analyze-string
and the functions
matches()
, replace()
, and tokenize()
.
Since JDK 1.4 includes an XML parser, there is no longer any good reason for Saxon to supply its own XML parser. Therefore AElfred is no longer included in the Saxon package, and the default will be to use the Crimson parser (or whatever is included in the JDK 1.4 distribution).
Note: JDK 1.4 appears to require more [or allocate less] stack space than JDK 1.3, some transformations that ran successfully in JDK 1.3 run out of stack space with JDK 1.4. This equally affects earlier Saxon releases when running with JDK 1.4
542981: Saxon fails with JDOM beta 0.8.
553347: The context node is not reset correctly after a stylesheet function is called from within an XPath predicate.
558696: Cannot include a simplified stylesheet.
561695: Error message "more than one method matches" when calling a Java method that accepts argument of class Object.
573314: The expression
string-length($x)=0
gives wrong result for "0" and "false".
576632: Match on parent node in a pattern fails.
580989: NullPointerException when tracing using -T option.
581515: Duplicate DOCTYPE declaration when using an identity transformer and HTML serialization.
583939: Memory leak when using keys.
584944: Attribute value templates on <xsl:sort> cannot depend on the context node.
Implemented the xsl:analyze-string
instruction, which supports regular expression
matching.
Where an embedded expression within an attribute value template yields a sequence of more than one
item, the string values of all the items are now output, separated by spaces. This is incompatible
with XSLT 1.0, which ignored all but the first node in a node-set. If this causes compatibility
problems (a) you can fix it by using the filter [1]
after the expression, (b) please
let me know: the XSL WG wants to know whether this incompatible change is likely to cause problems
in practice.
The elements xsl:variable
, xsl:param
, and xsl:result
may
now take a type
attribute indicating the required type of the value. The supplied value
will be converted to this type if necessary. The value of the attribute is the same subset of the
XPath SequenceType production as is implemented for "cast as" and "instance of" expressions: basically,
the fixed types such as "item" and "element" and the built-in types such as xs:string and xs:date, followed
by an optional occurrence indicator.
Parameters to xsl:function
may no longer specify a default value: all arguments
must be supplied in the function call.
An xsl:message
instruction may now appear inside an xsl:function
.
The xsl:text
instruction may now contain other instructions, such as xsl:value-of
.
Pending resolution of issue 132 in the spec, avoid using disable-output-escaping with nested xsl:text
elements. The effect is unlikely to be what you expected..
It is now an error to specify the mode
or priority
attributes on an
xsl:template
element with no match
attribute.
Match patterns using the id()
and key()
functions can now reference global
variables or parameters for the value of the id or key.
The attributes version
, exclude-result-prefixes
, and extension-element-prefixes
may now appear on any element in the XSLT namespace. Note that these attributes are prefixed xsl:
when used on a literal result element, but have no prefix when used on an XSLT element.
The attribute [xsl:]default-xpath-namespace
is now available on all elements. It defines the defualt
namespace to be used for unprefixed element names in path expressions and patterns.
The xsl:apply-templates
element now allows mode="#current"
and
mode="#default"
. The xsl:template
allows the mode
attribute
to be a list of mode names, optionally including #default
to match the default mode.
The disable-output-escaping
attribute of xsl:attribute
is implemented, replacing
the saxon:disable-output-escaping
extension, which is no longer available.
The xsl:destination
element is renamed xsl:principal-result-document
. (This was
misdocumented in version 7.1).
Implemented the unparsed-text()
function (with the second argument being mandatory).
Added a -v option to the command line to request XML validation. This applies to the principal source document and other files read using the document() function. It requires an XML parser that supports validation.
The same feature is available in the API using setFeature(FeatureKeys.VALIDATION, Boolean.TRUE)
on the TransformerFactory
.
Added a getTransformer()
method to the net.sf.saxon.Filter
class that is
created in response to SAXTransformerFactory#newXMLFilter()
. This allows setting of stylesheet
parameters, a URIResolver, etc, when using this interface. Not tested.
The standard TraceListener now outputs an abbreviated version of the file name of the stylesheet module containing an instruction, as well as the line number.
The indentation algorithm for method="xml" has been changed so no extra whitespace is output if there is already enough whitespace in the result tree: specifically, if a start tag is preceded by a newline and as many spaces as the indentation would output, then no extra indentation takes place. The effect is to avoid adding blank lines when copying XML that is already indented. This change does not affect method="html", because the HTML indentation rules are more complex and can easily affect the appearance of text in the browser if applied wrongly.
Implemented the regular expression functions matches(), replace() and tokenize() as defined in the Functions and Operators specification; also the regex-group() function defined in the XSLT 2.0 WD.
The only
option in the construct A instance of [only] B
has been
removed, as it is no longer defined in the XPath WD.
Changed the rules for the context document: this is now always the document containing the context node. If the context item is not a node, there is no context document, and any absolute path expression (or calls on id(), key(), or unparsed-entity-uri()) will cause a dynamic error.
Implemented the time
data type, the constructor xs:time(),
and the functions current-date() and current-time(). Time values can be compared for equality or
ordering, and can be sorted.
Implemented the component extraction functions, get-x-from-y, for date, dateTime, and time.
Changed DateTime and time classes so that the timezone is retained as part of the value. Equality and ordering is done by normalizing the time to UTC, but conversion to a string, and extraction of components, reflects the timezone as originally specified.
Constructor functions such as dateTime() have been moved to the schema namespace (you can use either "http://www.w3.org/2001/XMLSchema" (conventional prefix xs) or "http://www.w3.org/2001/XMLSchema-datatypes" (conventional prefix xsd). Stylesheets that use these constructor functions must be changed. The semantics of these constructor are identical to the cast expression.
Added the duration data-type, including conversion to and from strings, comparison for equality and
ordering, sorting, and component extraction. (This goes beyond the XPath 2.0 drafts, which do not allow
ordering on durations.) Ordering is based on the average length of a month (one
year = 365.25... days): so P365D < P1Y
and P366D > P1Y
.
Component extraction works on any kind of duration, and the functions are currently
named get-X-from-duration(), not get-X-from-yearMonthDuration() or get-X-from-dayTimeDuration().
Arithmetic involving durations or dates is not yet implemented.
Added the two XPath-defined subtypes of duration: xs:dayTimeDuration and xs:yearMonthDuration. Implemented the functions to construct these from a number of months or seconds. The "+" and "-" operators can be used to add two durations of the same type, and the "*" and "div" operators to multiply or divide a duration by a number.
Added the subtypes of xs:integer (xs:long, xs:int, xs:short and the rest). The type promotion rules for comparison and arithmetic on numeric types have been brought into line with the specification, though there are probably still a few minor discrepancies (especially where fallback conversions from strings are involved).
Added the idiv
operator for integer division. For example, 10 idiv 3
is 3.
The div
operator always returns a double result.
Added the subtypes of xs:string (token, language, Name, NCName, ID, IDREF, ENTITY, NMTOKEN); but not the list types IDREF, ENTITIES, NMTOKENS. These have no useful functionality beyond the ability to validate the lexical rules for each type.
Implemented the distinct-nodes() function (at the same time fixing a bug in union, intersect and except when supplied with arguments that are not in document order).
Implemented the deep-equal() function. Because nodes are still untyped, it compares string values of text nodes rather than typed values. Not yet tested with an explicit collation.
Renamed the sublist() function as subsequence().
Implemented the sequence-node-equal() and sequence-deep-equal() functions. Not yet tested with an explicit collation.
Implemented the functions node-kind(), root(), context-item().
Added the EXSLT functions in package math: abs, acos, asin, atan, atan2, constant, cos, exp, log, power, random, sin, sqrt, tan. Thanks to Simon St. Laurent for these. Only partially tested.
The saxon:intersection and saxon:difference extension functions have been dropped; instead use either the XPath 2.0 operators (intersect, except) or the EXSLT functions.
Many internal iterators work with a one-item lookahead. This is wasteful if the iteration is not
continued to completion, which happens for example with a numeric predicate such as expr[1], or
with an existential comparison such as sequenceA = sequenceB
, or when converting a
sequence to a string or a boolean. This lookahead has been removed for some commonly used iterations,
notably the FilterIterator, the MappingIterator, and the TinyTree SiblingIterator.
A consequence is that the hasNext() method of SequenceIterator can now throw an XPathException.
Deferred evaluation of variables happened in the past when the expression was a SequenceExpression. It now happens only if the compile-time cardinality of the expression allows more than one item. This means that deferred evaluation will not be used for an expression of the form expr[1]. And when deferred evaluation is used, the iterator is not primed by calling hasNext(): this means that (for an iterator that doesn't do lookahead), the search for the first item is now deferred until the variable is first used, and doesn't have to be repeated unnecessarily. In addition, if the variable is referenced in a context where only the first item in the sequence is required (e.g. to get the value as a boolean or as a string), the value is now saved without evaluating the full sequence.
I have added an optimization for constructs of the form <xsl:if test="a | b">
.
Where a union expression is evaluated in a boolean context it is now treated as if the operator were "or".
This potentially avoids the need to sort the two node-sets into document order.
There are some changes in the way global variables are handled. At compile time, a hash table is used in place of linear searching to search for duplicates: this should improve compilation performance for stylesheets with many global variables, especially when many of the variables are overridden by an importing stylesheet. At run-time, evaluation of global variables is now deferred until the first reference to the variable, which will improve execution performance when there are global variables that are never referenced. Note that this change will be visible if <xsl:message> is used to trace execution.
A filter expression of the form f[a and b]
is now rewritten as f[a][b]
when
appropriate, to enable an early exit in the case where a
is positional: for example
item[position() = 1 and child::desc]
. This is only done if a
is positional and
b
is not.
A union (or intersection or difference) of two path expressions is now rewritten to do the combination
as late as possible: for example ( /a/b/c | /a/b/d )
is rewritten as ( /a/b/(c|d) )
.
Note, this is a first small step in the identification of common subexpressions. The cases where two
subexpressions are detected as being identical are fairly limited, for example there is no knowledge
of which operators are commutative or associative.
The organization of the net.sf.saxon.functions package has changed. Much of the fixed information associated with individual functions is now contained in a static table in the StandardFunction.java module, rather than being returned by methods associated with each function. Most of the optimization methods (simplify, getDependencies, and reduce) now have a generic default implementation in the Function.java class, which most of the individual functions now use. This has reduced the overhead associated with implementing each function, which is important as the number of functions in XPath 2.0 has grown so much. It also creates further opportunities for combining the implementation of several related functions in one module, with better ability to share common code.
Unique document numbers are now allocated in the NamePool rather than the DocumentPool. This is visible in the results of generate-id(), because it means document numbers are not reset at the beginning of each transformation. This change has been made so that functions that rely on unique document numbers (for example, comparison of nodes into document order, or the union operation) can be done safely in a free-standing XPath environment. Eventually this will also allow document() to be executed outside an XSLT context - but not yet.
I have tested Saxon with the Resin XML parser, but found it very buggy (version 2.1.1)
I have tested Saxon with the Piccolo XML parser (version 1.03), and found it worked very well except for a few stress tests, particularly in the area of namespace handling. I have reported four bugs.
The tables for converting XPath data types to Java types (when calling extension functions) have been revamped. The design has changed so that where there are two methods that appear to match the function call, one of them will generally be chosen even if the choice is arbitrary: this is because in many cases where Java classes define polymorphic methods, the results will be the same whichever method is chosen.
This version of Saxon has been modified and tested to work with JDOM beta 0.8 and with FOP 0.20.3. In both cases, code changes were needed to work with these versions, and I have not tested whether the code still works with earlier versions; the chances are that it doesn't.
In general, bugs that have been cleared in Saxon 6.5.1 or Saxon 6.5.2 have also been cleared in this release. For details of the clearance of specific bugs, see the bug tracker at Sourceforge. Remember that closed bugs are not listed unless you ask for them.
The href
attribute of xsl:result-document
is now interpreted as a relative
URI, relative to the system ID of the principal result document. This works only where the system ID of the
principal output is known, and uses the "file://" protocol. The result document is no longer created
relative to the current working directory, for security reasons (it causes problems when executing
an untrusted stylesheet in a servlet environment).
Note that when Saxon is invoked from the command line, the -o option should be used to specify the
principal output destination. This will ensure that a suitable system ID is available. If the result document
is sent to the standard output stream (even if this is redirected to a file), Saxon will not know the
system identifier and will therefore be unable to create a secondary output destination using a relative
URI. It is still possible, of course, to specify an absolute URI as the value of the href
attribute - note that this must be a URL, not a filename, so it will typically start with file://
.
It is now possible to specify an OutputURIResolver to be used to resolve the URI specified in the href
attribute of the xsl:result-document
element. This will be used in place of the standard output URI
resolver. The OutputURIResolver is called when writing of the output document starts, at which point it must
return a JAXP Result object to act as the output destination. It is called again when writing of an output document
is complete. You can nominate an OutputURIResolver by calling
((Controller)transformer).setOutputURIResolver(new UserOutputResolver())
,
or by calling factory.setAttribute("http://saxon.sf.net/feature/outputURIResolver",
new UserOutputResolver())
.
If the -t option is used, a message is written to the standard error output identifying the
files written using using xsl:result-document
.
It is now an error to use xsl:result-document
when the current output destination
is a temporary tree.
The meaning of the ALLOW_EXTENSION_FUNCTIONS attribute in the TransformerFactory has been
extended so that setting the value to false
also disables extension elements and
the creation of multiple output files. This is because all these operations carry similar risks
when a servlet is allowed to execute untrusted stylesheets.
Added support for the separator
attribute of <xsl:copy-of>
.
The current()
function may now be used in a pattern (specifically, within a predicate).
Its value is the node being tested against the pattern. For example, match="*[*[name()=name(current())]"
matches any element that contains another element with the same name.
A global variable or parameter may now be used in the match pattern of xsl:template
, provided that it does
not cause a circularity (that is, it must be possible to evaluate the variable without calling
xsl:apply-templates
)
A global variable or parameter may now be used in the match pattern or the use expression
of xsl:key
, provided that it does
not cause a circularity (that is, it must be possible to evaluate the variable without using the key()
function against the key being defined)
The key()
function may now be used in the use
or match
attributes of
xsl:key
, provided the key definitions are not circular. (For example, key k1 can be defined
in terms of key k2, provided that k2 is not defined in terms of k1.)
The group-ending-with
attribute of xsl:for-each-group
is implemented. It is especially
useful where the last node in each group carries some kind of marker, for example continued="no"
.
Added attribute default="yes"|"no"
to saxon:collation
, to specify whether this collation
should be used as the default collation. If more than one collation is specified as the default, the last one wins.
If no default collation is specified, Unicode codepoint collation is used. The default collation is used by
the compare() function if no third argument is supplied, by xsl:sort if no collation is specified (for data type text
or string), and also by the comparison operators =, !=, <, >, etc.
The collation name is now a URI, not a QName.
Sorting and comparison according to Unicode codepoints can be achieved by setting up a collator as
<saxon:collation name="unicode" class="net.sf.saxon.sort.CodepointCollator"/>
The implementation of the "and" and "or" operators has reverted to two-valued logic, since three-valued logic didn't make it into the published XPath 2.0 working draft. (Actually, it seems 3-valued logic wasn't working in Saxon 7.0 anyway).
Changed the "==" and "!==" operators to "is" and "isnot".
Changed string literals to allow the delimiting quote marks to be doubled. For example,
<xsl:value-of select="'[He isn''t]'"/>
displays the string [He isn't]
Changed the some
and every
expressions to allow multiple range variables,
for example some $i in //I, $j in //J satisfies $i = $j
Implemented the singleton value-comparison operators (eq, ne, gt, lt, ge, le). These return an error if applied to a sequence containing more than one item, and return the empty sequence if either operand is an empty sequence; when applied to singletons, they return the same result as the XPath 1.0 operators (=, !=, etc).
Less-than and greater-than comparisons between nodes and/or strings now do a lexicographic comparison using the default collating sequence; at XPath 1.0 they did a numeric comparison. A warning is output in this situation (and one or two other situations, but not all) to advise of the backwards incompatibility.
The rules for deciding when path expressions need to be sorted have been revised. As a result many cases now require no sort where previously a sort was done. Examples of such expressions include a/b/c, .//a, $x[1]/a, //@a. In addition, most path expressions that return results in reverse document order are now sorted by a simple reversal, which is much faster than a full sort.
There's a temporary bug in that path expressions returning namespace nodes don't always return them in document order. I'm awaiting resolution of the XPath 2.0 data model rules before fixing this.
Suppress lazy evaluation of assignable variables. (This was designed to prevent a stack overflow, it didn't succeed, but it seems a good idea anyway).
Added the ability for a Source object to be supplied as the value of a stylesheet parameter or as the value returned by an extension function.
Added dateTime and date data types. Initially the only operations supported are the currentDateTime function, the dateTime and date constructors, and conversion between strings, dates, and dateTimes in both directions. Conversion to string uses the timezone of the current locale.
Implemented comparisons (equals, less-than, etc) between dates and dateTimes.
Also implemented sorting. The data-type of xsl:sort may take the two values "text" or "number"
(which are treated as synonyms of xs:string and xs:double) or any XML Schema built-in data type
for which sorting is supported. The values in the sequence to be sorted are converted to this
data type (using the same rules as for cast as
) and the rules for this data type
determine the sort order.
Note that (as required by the XML Schema specification) dateTime values are normalized to UTC. The original timezone specified when the dateTime was constructed is not retained. If no timezone is present, this fact is remembered. Such a dateTime is compared with other dateTimes as if it were a UTC dateTime.
Implemented the instance of operator (including the instance of only variant):
for example if ($x instance of xs:integer *) then x else y
. The types that are currently
supported are the 19 primitive schema types (the namespace may be either of the two namespaces
permitted in XML Schema Part 2), the derived type xs:integer, the node types document, element,
attribute, text, comment, processing-instruction, or namespace, and the abstract types node, and item.
(There is no syntax currently for the general numeric type or for the general atomic type). The
type name may be followed by one of the qualifiers "*", "+", or "?" to indicate the number of
occurrences; if there is no qualifier, there must be exactly one occurrence. The more sophisticated
forms of type-checking, using schema-defined complex types, are not yet supported.
Implemented the cast as data-type expression, for example cast as xs:boolean($x)
.
The conversion rules are the same as those which apply implicitly when a value is supplied
in a context where a different type is expected.
Implemented the treat as data-type expression. This doesn't actually have much use in an XSLT context, where type conversion is performed implicitly when required, and the semantics of the expression are probably not correctly implemented at this stage: the specification is still evolving.
A new API has been introduced for executing XPath expressions. This is simpler and safer than the API provided in previous releases, which was essentially improvised from implementation classes rather than being designed top-down as an interface suitable for application use. The API is loosely modelled on the proposed DOM Level 3 API for XPath.
The new API uses the class net.sf.saxon.xpath.XPathEvaluator
. This class provides a few
simple configuration interfaces to set the source document, the static context, and the context node,
plus a number of methods for evaluating XPath expressions. The static context can be omitted if the
expression does not use namespaces, external variables, or extension functions. If the expression uses
namespaces, an instance of StandaloneContext can be supplied, allowing the required namespaces to be
declared either explicitly, or by reference to the in-scope namespaces of some Node.
There are two methods for direct evaluation, evaluate() which returns a List containing the result of the expression (which in general is a sequence), and evaluateSingle() which returns the first item in the result (this is appropriate where it is known that the result will be single-valued). The results are returned as NodeInfo objects in the case of nodes, or as objects of the most appropriate Java class in the case of atomic values: for example, Boolean, Double, or String in the case of the traditional XPath 1.0 data types.
It is also possible to prepare an XPath expression for subsequent execution, using the createExpression() method on the XPathEvaluator class. This is worthwhile where the same expression is to be executed repeatedly. The compiled expression is represented by an instance of the class net.sf.saxon.xpath.XPathExpression, and it can be executed repeatedly, with different context nodes. However, the compiled expression is bound to one particular source document (this is to ensure that the same NamePool is used).
The design principle of this API is to minimize the number of Saxon classes that need to be used. Apart from the NodeInfo interface, which is needed when manipulating Saxon trees, only the four classes XPathProcessor, XPathExpression, StandaloneContext, and XPathException are needed. For convenience, XPathException and StandaloneContext have been moved to the net.sf.saxon.xpath package.
If you want to use extension functions or variables you will need to create your own implementation of StaticContext. Although this interface has been greatly simplified, this is still not to be attempted lightly.
The old APIs for executing expressions still exist for the time being, but they are likely to be less stable.
Changed ContentEmitter to check in startElement() that qname and local-name are both supplied; this checks against parser configuration errors. This change could (should?) be retrofitted to the 6.5 branch. The change also uses a stack of namecodes so that endElement() doesn't need to look up the names in the name pool. In implementing this change, I discovered that Saxon depends on the XML parser passing the QName argument to the startElement() call, something which according to the SAX2 specification is optional. However, all known parsers supply this argument, and the code changes to cope with its absence would damage performance, so I have simply documented this as a dependency on the parser.
Implemented infrastructure for data type support:
I have changed the implementation of temporary trees (result tree fragments). The FragmentValue class has disapeared. This delayed the construction of an actual tree until it the tree was actually used as a node-set: the effect was to optimize simple uses of temporary trees but at considerable cost to the more general usage which is now permitted in XSLT 2.0. Also, the introduction of tinytrees has reduced the value of this optimization. Therefore, a temporary tree is now constructed immediately as a real tree.
A side-effect of this change is that when disable-output-escaping is used while writing nodes to a tree, the instructions to switch escaping on and off are recorded in the tree in the form of the processing instructions defined by JAXP 1.1. Previously, these instructions were recorded in a form that kept the information through an xsl:copy-of instruction, but lost the information if the tree was processed in any other way. Note that the behavior of "sticky d-o-e" (that is, the effect of disabling output escaping when writing to a temporary tree) is currently an open issue in XSLT 2.0.
The indexes associated with keys are no longer referenced from each document instance, they are handled externally. This makes it easier to share the same index implementation across all the different document implementations. The indexes are now held by the KeyManager. It uses a WeakHashMap to ensure that when a document is removed from memory by the garbage collector, its indexes are removed too.
The mechanism for keeping stylesheet signatures in the namepool has been removed. It caused a creeping "memory leak" in continuously running services, and is not really needed. It was invented to allow namepools to be copied, but this facility has never been properly documented or tested. Instead, there is now a simple check that the source document and stylesheet are using the same namepool. (This change, or a simplified version of it, has also been made to 6.5.2).
The StaticContext interface has been greatly simplified, reducing duplication and making it easier to create a new implementation of this interface. This has been achieved partly by doing some work in the XPath ExpressionParser that was previously done in the StaticContext, and partly by changing those functions such as format-number() and sort() that only work in an XSLT context to check that the context is indeed XSLT before accessing the context information.
At the suggestion of Claudio Thomas [claudio.thomas@web.de], I have extended the sql:query instruction
to allow the attribute disable-output-escaping="yes|no"
. This is useful where the database
content being retrieved contains XML or HTML markup that is to be preserved in the output. Use this
with care; it disables escaping for all the rows and columns retrieved, some of which may contain
special characters such as "<" and "&" that do need to be escaped.
This change has not been tested.
Added extension functions: saxon:parse() and saxon:serialize(). These allow conversion of a string containing well-formed XML to a tree structure, and vice-versa.
Added extension functions: saxon:string-to-unicode() and saxon:unicode-to-string(). These allow conversion between a string and a sequence of integers representing the Unicode values of the characters in the string.
Added extension functions saxon:pause-tracing() and saxon:resume-tracing().
The return value from an extension function may now be an implementation of java.util.List
,
representing a sequence. The members of the List
must all implement net.sf.saxon.om.Item
An argument to an extension function may now be the class net.sf.saxon.om.NodeInfo
, or
a subclass. If the supplied value is a sequence, the first node in the sequence is passed to the function;
it is an error if there is no node in the supplied sequence, or if the node is of the wrong type.
The rules for calling extension functions with a sequence-valued argument have been clarified, and
some new options are permitted, e.g. declaring the argument as java.util.List
. The possibilities
have not been extensively tested.
Implemented memo functions (thanks to Robert Brotherus for the suggestion).
If you specify the attribute saxon:memo-function="yes"
on xsl:function
,
Saxon will keep a cache that maps the supplied argument values to the result of the
function, and if the function is called twice with the same arguments, the original
result will be returned without re-evaluating the function. Don't use this option on a function
that depends on the context, or on a function that creates a new temporary tree and
is required to create a new instance each time. Also note that there are cases where it
may be faster to re-evaluate the function than to do the lookup; this is especially true
if the argument is a large node-set.
This version introduces initial support of features defined in working drafts of XSLT 2.0 and XPath 2.0.
Version 7.0 should be regarded as an experimental alpha release. For production use, please continue to use Saxon 6.5
The Saxon package name has changed from com.icl.saxon to net.sf.saxon.
Any applications that use Saxon java classes directly (rather than relying on the JAXP
interface) will need to be modified. Note that this also affects the settings of the system
properties javax.xml.parsers.SAXParserFactory
and
javax.xml.transform.TransformerFactory
.
The entry point from the command line has changed from com.icl.saxon.StyleSheet to net.sf.saxon.Transform.
The namespace URI for saxon extensions has changed from http://icl.com/saxon to http://saxon.sf.net/. Note that many extensions have been withdrawn, as they are superseded by facilities in XPath 2.0 and/or XSLT 2.0.
To allow coexistence, the name of the JAR file for this release has changed to
saxon7.jar
. The SQL extensions are now in a separate JAR file, saxon7-sql.jar
.
A transformation can now be executed directly from the JAR file using the command
java -jar saxon7.jar
in place of java net.sf.saxon.Transform
.
Saxon now requires JDK 1.2 or later to run. In consequence, Saxon will no longer work with the Microsoft Java VM, and the Instant Saxon version of the product is therefore no longer available.
Because Saxon no longer runs with the Java VM, it can now be run as an applet within Internet Explorer only if the Sun Java plug-in is installed. You can get this from http://java.sun.com/getjava. This may require some configuration changes because of the differences in security policy.
The following sections summarize the main new features. These assume familiarity with the XPath 2.0 and XSLT 2.0 specifications; however, summaries of the new syntax for expressions and XSLT elements are included in this package.
($a, $b, $c)
.
Path expressions now return a sequence of nodes containing no duplicates, in document
order.a/(b|c)/d
,
or document('x')/key('a','b')
1 to 10
evaluates to the
sequence ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 )
*:localname
(like prefix:*) is allowed in path expressions, and also
in patterns and in xsl:strip-space
and xsl:preserve-space. It matches
any node with the given local name, regardless of namespace.xsl:value-of
element has a new separator
attribute, so it can
be used to output a sequence.xsl:for-each
element supports arbitrary sequences.saxon:group
and saxon:item
are
withdrawn.xsl:for-each-group
instruction, and the associated current-group()
function, are implemented.xsl:function
and xsl:result
elements are implemented;
these replace saxon:function
and exslt:function
.
Note that the XSLT 2.0 specification is more restrictive as to what can appear in a function
body: it has to be zero or more xsl:param
elements, followed by zero or more
xsl:variable
elements, followed by an xsl:result
element.
However, this is not a serious restriction in practice, because most computations can now
be carried out within a single XPath expression.xsl:namespace
instruction is implemented (it writes a
namespace node to the result tree)xsl:copy-of
can now handle sequences containing simple-values
(the simple value is converted to a string and written to the result tree).
However, the separator
attribute is not yet implemented.
xsl:document
element (and its synonym saxon:output
)
are replaced by xsl:result-document
. This no longer includes the serialization
attributes directly, instead it refers by name to an xsl:output
declaration,
or can use the unnamed xsl:output
declaration by default.xsl:output
element now supports method="xhtml"
,
replacing method="saxon:xhtml"
. The precise details of the output may not be fully
conformant with the specification.xsl:destination
element is provided, however, since the href
attribute is currently ignored, it is not very useful at this stage.saxon:handler
element is no longer supported.xsl:script
element is no longer supported - however,
the synonym saxon:script
remains availablecollation
attribute has been added to xsl:sort
,
and the implementation of sorting now uses JDK 1.2 collators. The collation
attribute must match the name attribute of a saxon:collation
element.
If none is specified, the lang
attribute is now used to select a collator, or if the
lang
attribute is omitted, a collator is obtained for the default locale. xsl:sort-key
element. A named
sort key may be used to perform a sort from within an XPath expression, using the new
XSLT-defined sort()
function.I have made the following changes to the function library:
saxon:distinct()
extension function now works on any sequence.count()
and sum()
functions now ork on any sequence,
and new functions avg()
, min()
, and max()
are provided.ends-with()
upper-case()
and lower-case()
.
These use the rules defined by the Java default localesystem-property()
saxon:range()
extension function
(it can now be done using the syntax "a to b")saxon:tokenize()
to return a sequence of strings instead of a node-setkey()
so that the second argument can be any sequence;
each member of the
sequence is converted to a string and treated as a potential key valuedocument()
so that the first argument can be any sequence; each member of
the sequence can be a URI of a document to be loaded.saxon:node-set()
extension function, which is now obsolete.saxon:if()
extension function, which is superseded by XPath 2.0
conditional expressions.saxon:closure()
function is temporarily withdrawn, because it relies on non-standard
use of the current()
function.node-set()
function in the EXSLT common module is now a no-op; the object-type()
function returns one of "sequence", "boolean", "number", "string", or "external".highest()
and lowest()
in the EXSLT math
module to work on arbitrary sequences.exists()
and empty()
,
insert()
and remove()
,
index-of()
and sublist()
.not3()
(three-valued not() function)string-pad()
functionsaxon:exists()
and saxon:for-all()
:
these are superseded by the some
and every
constructs in XPath 2.0compare()
function: the third argument (collation) is initially mandatory, and must
be a QName matching a saxon:collation
elementbase-uri()
function replacing the undocumented
saxon:base-uri()
extension functionfloat()
is the only way of creating a single-precision
floating point number.In general, features of XSLT 2.0 and XPath 2.0 not listed above have not been implemented. In particular, these include:
type
attribute of xsl:variable
, etc.instance of
and
cast as
.As might be expected, the Saxon code has undergone major change internally, which will affect any application making significant use of internal interfaces. Here are some of the highlights:
for
expressions. It is also used in various other contexts, e.g. in the implementation of
the document(), key() and id() functions.getStringValue
,
this allows both SimpleValues and Nodes to implement the new Item interface, which
represents a member of a sequence.net.sf.saxon.value
, now contains all the data-type related classes. I have removed documentation of the saxon:trace extension attribute; it seems this hasn't been working for some time.
Context
class no longer implements the XSLT 1.1 WD interface
org.w3c.xsl.XSLTContext.com.icl.saxon.handlers
(ElementHandler etc). It is still
possible for a Java application to register a NodeHandler to receive events; this
must now be written as an implementation of the net.sf.saxon.NodeHandler interface.
See the ShowBooks.java sample application to see how.data-type
or lang
attribute of xsl:sort; instead, it must be specified using the
collation
attribute,
with a saxon:collation
element that maps the named collation to a Java
class that implements the JDK java.util.Comparator
interface.A new sql:query
instruction has been added, to accompany the
existing sql:connect
, sql:insert
, etc.
Attributes:
table | The table to be queried (the contents of the FROM clause of the select statement). This is mandatory, the value is an attribute value template. |
column | The columns to be retrieved (the contents of the SELECT clause of the select statement). May be "*" to retrieve all columns. This is mandatory, the value is an attribute value template. |
where | The conditions to be applied (the contents of the WHERE clause of the select statement). This is optional, if present the value is an attribute value template. |
row-tag | The element name to be used to contain each row. Must be a simple name (no colon allowed). Default is "row". |
column-tag | The element name to be used to contain each column. Must be a simple name (no colon allowed). Default is "col". |
The sql:query
instruction writes zero or more row elements to the current
result tree, each containing zero or more column elements, which contain the data values.
Thanks to Claudio Thomas [claudio.thomas@web.de] who supplied the original version of this code.
The SQL extensions are now contained in a separate JAR file, saxon7-sql.jar
,
which must be on the class path if these extensions are used.
Michael H. Kay
24 March 2004