This page describes how to extend the capability of Saxon XSLT Stylesheets
An extension function is invoked using a name such as prefix:localname()
.
The prefix must
be the prefix associated with a namespace declaration that is in scope.
Extension functions must be implemented in Java.
The command line option -TJ is useful for debugging the loading of Java extensions. It gives detailed information about the methods that are examined for a possible match.
Saxon supports the <saxon:script>
element, based on the
<xsl:script>
element defined in the (now withdrawn) XSLT 1.1 working draft.
This element defines a mapping between a namespace URI used in calls
of extension functions, and a Java class that contains implementations of these functions.
See saxon:script for details.
You can also use a short-cut technique of binding external Java classes, by making the
class name part of the namespace URI. In this case, you don't need a <saxon:script>
element.
With the short-cut technique, the URI for the
namespace identifies the class where the external function will be found.
The namespace URI must either be "java:" followed by the fully-qualified class name
(for example xmlns:date="java:java.util.Date"
),
or a string containing a "/", in which the fully-qualified class name appears after the final "/".
(for example xmlns:date="http://www.jclark.com/xt/java/java.util.Date"
). The part of
the URI before the final "/" is immaterial. The class must be on the classpath. For compatibility
with previous releases, the format xmlns:date="java.util.Date"
is also supported.
The Saxon namespace URI "http://saxon.sf.net/" is recognised as a special case, and causes the
function to be loaded from the class net.sf.saxon.functions.Extensions
.
This class name can be specified explicitly if you prefer.
There are three cases to consider: static methods, constructors, and instance-level methods.
Static methods can be called directly.
The localname of the function must match the name of a public static method in this class. The names
match if they contain the same characters, excluding hyphens and forcing any character that follows
a hyphen to upper-case. For example the XPath function call to-string()
matches the Java method
toString()
; but the function call can also be written as toString()
if you prefer.
If there are several methods in the class that match the localname, and that have the correct number
of arguments, then the system attempts
to find the one that is the best fit to the types of the supplied arguments: for example if the
call is f(1,2)
then a method with two int arguments will be preferred to one with two String
arguments. The rules for deciding between methods are quite complex. Essentially, for each candidate method,
Saxon calculates the "distance" between the types of the supplied arguments and the Java class of the
corresponding method in the method's
signature, using a set of tables given below. For example, the distance between the XPath data type "boolean"
and the Java class "boolean" is very small, while the distance between an XPath string and a Java boolean
is much larger. If there is one candidate method where the distances of all arguments are less-than-or-equal-to
the distances computed for other candidate methods, and the distance of at least one argument is smaller,
then that method is chosen.
If there are several methods with the same name and the correct number of arguments, but none is
preferable to the others under these rules, an error is reported: the message indicates that there is
more than one method that matches the function call.
For example:
<xsl:value-of select="math:sqrt($arg)"
xmlns:math="java:java.lang.Math"/>
This will invoke the static method java.lang.Math.sqrt(), applying it to the value of the variable $arg, and copying the value of the square root of $arg to the result tree.
Java constructors are called by using the function named new()
.
If there are several constructors, then again
the system tries to find the one that is the best fit, according to the types of the supplied arguments. The result
of calling new()
is an XPath value of type Java Object; the only things that can be done with a Java Object
are to assign it to a variable, to pass it to an extension function, and to convert it to a string, number,
or boolean, using the rules given below.
Instance-level methods (that is, non-static methods) are called by supplying an extra first argument of type Java Object which is the object on which the method is to be invoked. A Java Object is usually created by calling an extension function (e.g. a constructor) that returns an object; it may also be passed to the style sheet as the value of a global parameter. Matching of method names is done as for static methods. If there are several methods in the class that match the localname, the system again tries to find the one that is the best fit, according to the types of the supplied arguments.
For example, the following stylesheet prints the date and time. This example is copied from the documentation of the xt product, and it works unchanged with Saxon, because Saxon does not care what the namespace URI for extension functions is, so long as it ends with the class name. (Extension functions are likely to be compatible between Saxon and xt provided they only use the data types string, number, and boolean).
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:date="http://www.jclark.com/xt/java/java.util.Date">
<xsl:template match="/">
<html>
<xsl:if test="function-available('date:to-string') and function-available('date:new')">
<p><xsl:value-of select="date:to-string(date:new())"/></p>
</xsl:if>
</html>
</xsl:template>
</xsl:stylesheet>
A Java method called as an extension function may have an extra first argument of
class net.sf.saxon.expr.XPathContext
.
This argument is not
supplied by the calling XSL code, but by Saxon itself. The XPathContext object provides methods to access many
internal Saxon resources, the most useful being current() which returns the current Item.
The XPathContext object is not available with constructors.
If any exceptions are thrown by the method, or if a matching method cannot be found, processing of the stylesheet will be abandoned. If the tracing option has been set (-T) on the command line, a full stack trace will be output. The exception will be wrapped in a TransformerException and passed to any user-specified ErrorListener object, so the ErrorListener can also produce extra diagnostics.
The following conversions are supported between the supplied value of the argument and the declared Java class of the argument. The mappings are given in order of preference; a class that appears earlier in the list has smaller "conversion distance" than one appearing later. These priorities are used to decide which method to call when the class has several methods of the same name. Simple classes (such as boolean) are acceptable wherever the corresponding wrapper class (Boolean) is allowed. Class names shown in italics are Saxon-specific classes.
If the supplied value is a singleton (a sequence of one item) then the type of that item is decisive. If it is a sequence of length zero or more than one, then the general rules for a sequence are applied, and the types of the items within the sequence are irrelevant.
Supplied value | Required type |
boolean | BooleanValue, Boolean, String, Byte, Character, Double, Float, Integer, Long, Short, Byte), Object |
dateTime | DateTimeValue, Date, String, Object |
date | DateValue, Date, String, Object |
decimal | DecimalValue, BigDecimal, Double, Float, Long, Integer, Short, Character, Byte, Boolean, String, Object |
double | DoubleValue, Double, Float, Long, Integer, Short, Character, Byte, Boolean, String, Object |
duration | DurationValue, String, Object |
float | FloatValue, Float, Double, Long, Integer, Short, Character, Byte, Boolean, String, Object |
integer | IntegerValue, Long, Double, Float, Integer, Short, Character, Byte, Boolean, String, Object |
string | StringValue, String, Character, Double, Float, Integer, Long, Short, Boolean, Byte, Object |
node | SingletonNodeSet, NodeList, (Element, Attr, Document, DocumentFragment, Comment, Text, ProcessingInstruction, CharacterData), Node, Boolean, Byte, Character, Double, Float, Integer, Long, Short, String, Object |
sequence | SequenceIterator, SequenceValue, List, NodeInfo, Node, String, Boolean, Byte, Character, Double, Float, Integer, Long, Short, Object |
Saxon first tries to select the appropriate method based on the static type of the arguments to the function call. If there is insufficient information statically, it tries again at run-time, based on the dynamic type of the arguments once evaluated. This means that the same function call may invoke different methods on different occasions.
Note that the XPath value is considered to be one of the singleton classes if it is produced by an
expression that always produces a singleton. So the expression 1+2
will be an integer. An
expression that can potentially produce a sequence of any cardinality is represented in the table above
by the generic class "sequence". For example, ($a to $b)
is represented as a sequence, except
when it can be determined statically that $a
and $b
are equal, in which case it is represented as an
integer.
These rules will probably be rationalized further in future releases.
A wrapped Java object may be converted to another data type as follows.
The result type of the method is converted to an XPath value as follows.
net.sf.saxon.om.NodeInfo
(a node in a Saxon tree), the XPath value will be a sequence containing a single node.javax.xml.transform.Source
(other than a NodeInfo
),
a tree is built from the specified Source
object, and the root node of this tree is returned as
the result of the function.net.sf.saxon.value.Value
,
the returned value is used unchanged.net.sf.saxon.om.SequenceIterator
(an iterator over a sequence), the XPath
value will be the sequence represented by this iterator. It is essential that this iterator
properly implements the method getAnother()
which returns a new iterator over the
same sequence of nodes or values, positioned at the start of the sequence.java.util.List
, the XPath value will be the
sequence represented by the contents of this List
. The members of the list will each be converted
to an XPath value, as if each member was supplied from a separate function call. An error is reported if
the result contains a list nested within another list. The contents of the list are copied immediately on return from the
function, so the List
object itself may be safely re-used.NodeList
, the list of nodes is returned as a Saxon node-set. However,
all the nodes must be instances of class net.sf.saxon.om.NodeInfo
, that is, they must use Saxon's tree
implementation, not some third-party DOM. But any implementation of NodeList
can be used. The nodes
can come from the original source tree, or from a newly-constructed tree, so long as it is constructed
using Saxon.net.sf.saxon.om.NodeInfo
, it is
rejected: the result must use Saxon's DOM implementation, not some third-party DOM.Note that Saxon's two principal tree structures both conform to the DOM Core Level 2 interface. However, they are read-only:
any attempt to modify the tree causes an exception. Saxon's trees can only be built using the Saxon
subclasses of the net.sf.saxon.tree.Builder
class, and they cannot be modified in situ.
(The simplest way for a Java application to build a Saxon tree is by
using the net.sf.saxon.xpath.XPathEvaluator
class.)
The system function function-available(String name) returns true if there appears to be a method available with the right name. It does not test whether this method has the appropriate number of arguments or whether the arguments are of appropriate types. If the function name is "new" it returns true so long as the class is not an abstract class or interface, and so long as it has at least one constructor.
There are a number of extension functions supplied with the Saxon product: for details, see
extensions.html. The source code of these methods, which
in most cases is extremely simple, can be used as an example for writing
other user extension functions. It is found in class net.sf.saxon.functions.Extensions
.
Saxon implements the element extensibility feature defined in the XSLT standard.
This feature allows you to define your own instruction types for use in the stylesheet. These
instructions can be used anywhere within a content constructor, for example as a child
of xsl:template
, xsl:if
, xsl:variable
, or of a literal
result element.
If a namespace prefix is to be used to denote extension elements, it must be declared in the
extension-element-prefixes attribute on the xsl:stylesheet
element, or the
xsl:extension-element-prefixes attribute on any enclosing literal result element or
extension element.
Note that Saxon itself provides a number of stylesheet elements beyond those defined in the
XSLT specification, including saxon:assign
, saxon:entity-ref
,
and saxon:while
. To enable these, use the standard XSL extension mechanism: define
extension-element-prefixes="saxon" on the xsl:stylesheet element, or
xsl:extension-element-prefixes="saxon" on any enclosing literal result element.
To invoke a user-defined set of extension elements, include the prefix in this attribute as described, and associate it with a namespace URI that ends in "/" followed by the fully qualified class name of a Java class that implements the net.sf.saxon.style.ExtensionElementFactory interface. This interface defines a single method, getExtensionClass(), which takes the local name of the element (i.e., the name without its namespace prefix) as a parameter, and returns the Java class used to implement this extension element (for example, "return SQLConnect.class"). The class returned must be a subclass of net.sf.saxon.style.StyleElement.
The best way to see how to implement an extension element is by looking at the example, for SQL
extension elements, provided in package net.sf.saxon.sql, and at the sample stylesheet books-sql.xsl
which uses these extension elements. The main methods a StyleElement
class must provide are:
prepareAttributes() | This is called while the stylesheet tree is still being built, so it should not attempt to navigate the tree. Its task is to validate the attributes of the stylesheet element and perform any preprocessing necessary. For example, if the attribute is an attribute value template, this includes creating an Expression that can subsequently be evaluated to get the AVT's value. |
validate() | This is called once the tree has been built, and its task is to check that the stylesheet element appears in the right context within the tree, e.g. that it is within a template |
process() | This is called to process a particular node in the source document, which can be accessed by reference to the XSLTContext object supplied as a parameter. |
isInstruction() | This should return true, to ensure that the element is allowed to appear within a template body. |
mayContainTemplateBody() | This should return true, to ensure that the element can contain instructions. Even if it can't contain anything else, extension elements should allow an xsl:fallback instruction to provide portability between processors |
The StyleElement
class has access to many services supplied either via its superclasses or via
the XSLTContext object. For details, see the API documentation of the individual classes.
Any element whose prefix matches a namespace listed in the extension-element-prefixes
attribute of an enclosing element is treated as an extension element. If no class can be
instantiated for the element (for example, because no ExtensionElementFactory can be loaded,
or because the ExtensionElementFactory doesn't recognise the local name), then fallback
action is taken as follows. If the element has one or more xsl:fallback
children, they are
processed. Otherwise, an error is reported. When xsl:fallback
is used in any other context, it
and its children are ignored.
It is also possible to test whether an extension element is implemented by using the system
function element-available()
. This returns true if the namespace of the element identifies
it as an extension element (or indeed as a standard XSL instruction) and if a class can be instantiated
to represent it. If the namespace is not that of an extension element, or if no class can be
instantiated, it returns false.
Saxon takes its input from a SAX2 Parser reading from an InputSource. A very useful technique is to interpose a filter between the parser and Saxon. The filter will typically be an instance of the SAX2 XMLFilter class.
See the TrAX examples for hints on using a Saxon Transformer as part of a chain of SAX Filters.
Note that Saxon relies on the application to supply a well-balanced sequence of SAX events; it doesn't need to be well-formed (the root node can have any number of element or text children), but if it isn't well-balanced, the consequences are unpredictable.
The -x
option on the Saxon command line specifies the parser that Saxon will
use to process the source files. This class must implement the SAX2 XMLReader interface, but it
is not required to be a real XML parser; it can take the input from any kind of source file,
so long as it presents it in the form of a stream of SAX events. When using the JAXP API, the
equivalent to the -x
option is to call transformerFactory.setAttribute(
net.sf.saxon.FeatureKeys.SOURCE_PARSER_CLASS, 'com.example.package.Parser')
The output of a Saxon stylesheet can be directed to a user-defined output filter. This filter can be
defined either as a SAX2 ContentHandler, or
as a subclass of the Saxon class
net.sf.saxon.event.Emitter. The advantage of using an Emitter is that more information is available
from the stylesheet, for example the attributes of the xsl:output
element.
When a ContentHandler is used, Saxon will by default always supply a stream of events corresponding
to a well-formed document. (The XSLT
specification also allows the output to be an external general parsed entity,
also known as a "well-balanced document".) If the result tree is not
well-formed, Saxon will raise a dynamic error, unless the ContentHandler indicates that it is prepared
to accept such a result tree, which it can do by implementing the marker interface
net.sf.saxon.event.BalancedContentHandler
. This interface has no methods.
As specified in the JAXP 1.1 interface, requests to disable or re-enable output escaping
are also notified to the content handler by means of special processing instructions. The
names of these processing instructions are defined by the constants PI_DISABLE_OUTPUT_ESCAPING
and PI_ENABLE_OUTPUT_ESCAPING defined in class javax.xml.transform.Result
.
If an Emitter is used, however, it will be informed of all events.
The Emitter or ContentHandler to be used is specified in the method attribute of the
xsl:output
element, as a fully-qualified class name; for example
method="prefix:com.acme.xml.SaxonOutputFilter"
. The namespace prefix is ignored, but
must be present to meet XSLT conformance rules.
See the documentation of class net.sf.saxon.output.Emitter
for details of the methods available, or
implementations such as HTMLEmitter and XMLEmitter and TEXTEmitter for the standard output formats
supported by Saxon.
It can sometimes be useful to set up a chain of emitters working as a pipeline. To write a filter
that participates in such a pipeline, the class ProxyEmitter is supplied. Use the class XMLIndenter
,
which handles XML indentation, as an example of how to write a ProxyEmitter.
Rather than writing an output filter in Java, Saxon also allows you to process the output through
another XSL stylesheet. To do this, simply name the next stylesheet in the saxon:next-in-chain
attribute
of xsl:output
.
Any number of
user-defined attributes may be defined on xsl:output
. These
attributes must have names in a non-null namespace, which must not be either the XSLT
or the Saxon namespace. These attributes are interpreted as attribute value templates.
The value of the attribute is inserted into the Properties
object made available to
the Emitter
handling the output; they will be ignored by the standard output methods,
but can supply arbitrary information to a user-defined output method. The name of the
property will be the expanded name of the attribute in JAXP format, for example
{http://my-namespace/uri}local-name
, and the value will be the value as given,
after evaluation as an attribute value template.
It is possible to define a collating sequence for use by xsl:sort. This is controlled through the
class
attributes of the saxon:collation
element.
See saxon:collation for details.
It is possible to define a numbering sequence for use by xsl:number
.
This is controlled through the lang
attribute of the xsl:number
element. The feature is primarily intended to provide language-dependent numbering,
but in fact it can be used to provide arbitrary numbering sequences: for example if you want to number items
as "*", "†", "‡", "§", "¶" etc, you could implement a numbering class to do this and invoke it say with
lang="x-footnote".
To implement a numberer for language X, you need to define a class net.sf.saxon.number.Numberer_X
,
for example net.sf.saxon.sort.Numberer_xfootnote
. This must implement the interface Numberer
. A (not very
useful) Numberer
is supplied for lang="de" as a specimen, and you can use this as a prototype to write your
own. A numbering sequence is also supplied for lang="en", and this is used by default if no other can be loaded.
Note that any hyphens in the language name are ignored in forming the class name, but case is significant.
For example if you specify lang="en-GB"
, the Numberer must be named net.sf.saxon.number.Numberer_enGB
.
If you want to use an output encoding that is not directly supported by Saxon
(for a list of encodings that are supported, see conformance.html)
you can do this by writing a Java class that implements the interface
net.sf.saxon.charcode.PluggableCharacterSet.
You need to supply two methods: inCharSet()
which tests whether a particular Unicode character
is present in the character set, and getEncodingName()
which returns the name given to the
encoding by your Java VM. The encoding must be supported by the Java VM. To use this encoding,
specify the fully-qualified class name as the value of the encoding attribute in xsl:output.
Alternatively, it is possible to specify the CharacterSet class to be used for a named output
encoding by setting the system property, e.g. -D"encoding.EUC-JP"="EUC_JP"
; the value
of the property should be the name of a class that implements the
PluggableCharacterSet interface. This indicates the class to be used when the xsl:output
element specifies encoding="EUC-JP".
Saxon allows you to write your own URIResolver to handle the URIs of input documents, as
defined in the JAXP 1.1 specification. Such a URIResolver is used to process the URIs used
in the xsl:include
and xsl:import
declarations as well as the
document()
function. It is also used to process the URIs supplied for the
source document and the stylesheet on the command line. The URIResolver is called to process
the supplied URI, and it returns a JAXP Source
object, which Saxon uses as the source of
the input. Note that the Source must be one of the implementations of Source that Saxon
recognizes, you cannot write your own implementations of the JAXP 1.1 Source class.
Saxon also allows you to write an OutputURIResolver, which performs an
analogous role for URIs specified in the href
attribute of xsl:result-document
.
The OutputURIResolver is called when writing of the output document starts, at which point it must
return a JAXP Result
object to act as the output destination. It is called again when writing of an output document
is complete.
You can nominate an OutputURIResolver by calling
((Controller)transformer).setOutputURIResolver(new UserOutputResolver())
,
or by calling factory.setAttribute("http://saxon.sf.net/feature/outputURIResolver",
new UserOutputResolver())
.
Michael H. Kay
14 February 2003