Basically, XSLT allows you to transform some input XML document to
some other XML document. The transformation is from an input tree to
an output tree. This is not about building an arbitrary output string;
the XSLT parser requires that the output be well formed XML. There are
various hacky things you can do with XSLT to make it output arbitrary
text, but in general the output should be XML.
UPDATE, 2004-09-28: Unknown Pedant informs me that XSLT allows arbitrary output, not just XML, no hackery required. This node is nearly 3 years old at this point, so it may be out of date. I no longer work with XML or XSLT very much, and I'm too lazy to verify this myself. However, as it stood 3 years ago, I felt getting plain text out of a XSLT engine involved some hackery. This may or may not have been true, either now or then. Check your XSLT engine documentation and w3.org for definitive information.
XSLT is particular about how an output tree is built. If a tag that
is going in the output tree is opened, then it must be closed before
the end of the template. If you want to build arbitrary tags (make
output tags from the text in an input element, then you have to use
xsl:element, rather than xsl:value-of. The same goes for attributes; if you
want to add an attribute to a element on the output tree, use
xsl:attribute, rather than trying to
concatenate strings together to put the attribute inside the tag (that
won't work). Examples of all of this will appear in the examples
section.
XPath's relationship to XSLT
XPath expressions are used in XSLT to select nodes for
processing, specifying conditions used in processing, and for
generating text that is inserted in the output tree. XPath expressions
appear in XSLT as the value of certain attributes (the match attribute
of the xsl:template element, or the select
attribute of many other elements) and in attribute value
templates.
In XPath, a reference is often made to the context node. This
context node is specified by the XSLT processor; it is the node
currently being processed.
Vocabulary and syntax
The main problem that people have in understanding XSLT is all the
weird vocabulary that you run across when reading about it. Most of
this vocabulary actually comes from XPath, which is the language used
in XSLT to find parts of the source XML document. See XPath vocabulary
and syntax for the most of the relevant information.
XSLT uses normal XPath and XML syntax. An XSLT stylesheet is an XML
document, so all of the normal XML rules apply. All attribute values
must be in quotes, all opened tag must be closed, element and
attribute names are case sensitive, and so on.
One peculiarity in XSLT that I haven't seen elsewhere with XML is
curly braces. It deserves some explanation because it's not obvious
what they are used for if you just happen across them in a stylesheet
somewhere.
Curly braces ({}) specify an attribute template; the XPath
expression that they enclose is evaluated and used as the value of an
element attribute. Curly braces may only be used in the attribute
value of an output element. If you need other text other than just the
expression in curly braces in an attribute value, you have to use
xsl:attribute and xsl:value-of to get the desired effect. Something like
<a href="document?input={@value}"> won't work, at least
with Xalan.
Here's an example of how to use curly braces:
<xsl:template match="input">
<output-element output-attr="{xp-expr}"/>
</xsl:template>
<xsl:template match="input">
<xsl:text>Arbitrary text: </xsl:text>{xp-expr}
</xsl:template>
<xsl:template match="input">
<xsl:text>Arbitrary text: </xsl:text> <xsl:value-of select="xp-expr"/>
</xsl:template>
This first template in this example shows the use of an attribute
value template. The curly braces inside the value of output-attr tell
XSLT to interpret the enclosed text as an XPath expression. The result
of evaluating the XPath expression becomes the value of the attribute.
The second template shows an illegal use of an attribute value
template. Since the attribute value template isn't in an attribute
value, it won't be interpreted. The proper way to do this sort of
thing is by using the xsl:value-of element, as shown in the
third template (note the lack of curly braces).
Whitespace
XSLT can handle whitespace in an input document in a couple of
ways. The XSLT instructions for doing this are xsl:strip-space and xsl:preserve-space. Both of these instructions have an
element attribute that specifies the names of the node to
operate on.
In the case of xsl:strip-space, any text
node it matches that contains only whitespace is removed from the
input tree after it has been loaded but before it has been
processed. xsl:preserve-space just tells the
processor to leave the text nodes that it matches in the input
tree.
An example of using xsl:strip-space:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<!-- Do something here... -->
</xsl:template>
In this example, the * match all elements, so all text nodes
containing only whitespace are removed from the input tree before
being it is processed.
Templates
XSLT templates are basically constructs that contain templates of
what is supposed to go into the output document, with the template
being completed by information from the source document. Templates
usually match some element in the source document, so that when that
element is found, it is processed by the template and the results are
placed in the output tree.
Templates can also be thought of as being almost like
functions. Templates can be called using the xsl:call-template instruction. Templates can also be
passed arbitrary parameters if they declare those parameters using
xsl:param and the calling template uses the
xsl:with-param instruction within the
xsl:call-template instruction.
Templates are more commonly "applied", using the xsl:apply-templates instruction. If a template matches
an element, none of that elements descendants are processed unless the
template that matched that element contains the xsl:apply-templates instruction.
Variables
XSLT is not a imperative programming language. It's more of a
declarative language, like SQL (not like T-SQL). This means that the
programmer describes the result required; the implementation takes
care of the details of producing that result.
XSLT does have variables like a normal language, but they aren't
really variables, not the way an imperative programmer likes to think
of variables anyway. XSLT variables can only be set once each time
they are processed. If a variable is defined inside a template, it may
take on a different value each time the template is processed. An
example will help to illustrate this.
Take this document:
<root>
<countme at1="hi" at2="j" at3="hi"/>
<countme at1="hi" at2="j" at3="hi" at4="j"/>
<countme at1="hi" at2="j" at3="hi" at4="j" at5="hi" at6="j"/>
<countme at1="hi" at2="j" at3="hi" at4="j"/>
</root>
This stylesheet, with interesting things in bold:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="countme">
<xsl:variable name="pos" select="position()"/>
<xsl:variable name="numattr" select="count(attribute::node())"/>
<xsl:text>
Number of attributes in countme #</xsl:text><xsl:value-of select="$pos"/>
<xsl:text>: </xsl:text><xsl:value-of select="$numattr"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
This output is produced:
Number of attributes in countme #2: 3
Number of attributes in countme #4: 4
Number of attributes in countme #6: 6
Number of attributes in countme #8: 4
The reason the output starts with countme #2 is because of the
whitespace in the input document. Between the end of the
<root> tag and the start of the first
<countme> tag, there is some whitespace. The stylesheet
processor is representing this whitespace as text nodes along the
child axis. So the child axis of the <root> element
contains both text and element nodes. The position function returns
the position of the node along the axis before the node test and the
predicate are applied. The whitespace between <root>
and the first <countme> is a text node in position
number one, so the first <countme> node is actually in
position number two. It is possible to request that the XSLT processor
remove (strip) text nodes containing only whitespace from the input
tree before processing by using the xsl:strip-space instruction.
Anyway, you can see that the output is different for each
countme element. The value stored in the variables are
clearly changing; they can do this because once the first
countme template is finished, the variables disappear (they
go out of scope), and they can be set anew the next time the template
is processed. However, if I tried to use a second xsl:variable element to set the numattr
variable to a different value inside the same template, then the
stylesheet processor would give me an error, because a variable can
only be set once inside a scope.
Or at least, that's the way it's supposed to work, but I've
discovered that Xalan, the XSLT stylesheet parser I've been using,
actually allows changing the value of a variable. Testing shows
that MSXML3, unlike Xalan, enforces the defined non-mutability of XSLT
variables.
Parameters
XSLT's xsl:template directive can be
thought of as a function, like those in an imperative programming
language. XSLT templates can be called directly from other templates
using xsl:call-template directives. This is
a very powerful tool in XSLT; in particular it can be used to perform
recursion, which can be used to create loops that execute a given
number of times (XSLT doesn't support this kind of loop any other
way).
Here is an example the illustrates parameters and the use of
recursion to create a loop.
The XML document:
<root>
<countme at1="hi" at2="j" at3="hi"/>
<countme at1="hi" at2="j" at3="hi" at4="j"/>
<countme at1="hi" at2="j" at3="hi" at4="j" at5="hi" at6="j"/>
<countme at1="hi" at2="j" at3="hi" at4="j"/>
</root>
The stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:template name="test">
<xsl:param name="input" select="'this is a string, not an expression.'"/>
<xsl:text>"Test" called, the input parameter is: </xsl:text>
<xsl:value-of select="$input"/><xsl:text>
</xsl:text>
<xsl:if test="$input = true()">
<xsl:call-template name="test">
<xsl:with-param name="input" select="$input - 1"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
<xsl:template match="countme">
<xsl:variable name="pos" select="position()"/>
<xsl:variable name="numattr" select="count(attribute::node())"/>
<!-- <xsl:variable name="pos" select="$pos div 2"/> -->
<xsl:text>
Number of attributes in countme #</xsl:text><xsl:value-of select="$pos"/>
<xsl:text>: </xsl:text><xsl:value-of select="$numattr"/><xsl:text>
</xsl:text>
<xsl:call-template name="test">
<xsl:with-param name="input" select="'2'"/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
The output:
Number of attributes in countme #2: 3
"Test" called, the input parameter is: 2
"Test" called, the input parameter is: 1
"Test" called, the input parameter is: 0
Number of attributes in countme #4: 4
"Test" called, the input parameter is: 2
"Test" called, the input parameter is: 1
"Test" called, the input parameter is: 0
Number of attributes in countme #6: 6
"Test" called, the input parameter is: 2
"Test" called, the input parameter is: 1
"Test" called, the input parameter is: 0
Number of attributes in countme #8: 4
"Test" called, the input parameter is: 2
"Test" called, the input parameter is: 1
"Test" called, the input parameter is: 0
In this case, the test template is called first by the
countme template, and then by itself until the input
parameter becomes false (zero). An input of zero is the base case of
the recursion. If a recursive function (template for XSLT) doesn't
have a base case, it will likely loop forever, or at least until the
OS intervenes and shuts it down for using too much stack space.
Loops
With XSLT, loops are used to iterate over a set of nodes. You will
need to use loops in situations when you want to change where output
appears. For example, if you have an XHTML document with H1 tags
throughout it, and you just want to add a table of contents to the top
of the document, you can do this by using an xsl:template to match the body tag, and then use a
xsl:for-each to match each of the H1 tags in
the document and output links to those tags.
Unlike xsl:template matches, which only
uses a subset of XPath to match patterns, you can use all of XPath
to match elements when you use xsl:for-each,
which is the XSL looping construct. For example, you cannot use the
ancestor axis with a xsl:template, but
ancestor works fine when you use it with xsl:for-each.
Conditions
XSLT provides a couple of mechanisms for handling conditional
execution. One of these is xsl:if, and the
other is xsl:choose. The xsl:if
directive has no else clause, so if you need to test for more than one
possible value, xsl:choose is almost certainly a better
choice.
The xsl:if element has one attribute. The value of the
test attribute must evaluated to a boolean result. If it is
true, the fragment enclosed by the xsl:if block is
executed.
The xsl:choose element contains one or more xsl:when elements, each of which have a test
attribute evaluating to a boolean value, and an optional xsl:otherwise element, which is used if none of the
xsl:when clauses are matched.
There are many handy XSLT resources available on the web. A couple
of good starting points are http://xml.com and
http://www.w3c.org/Style/XSL/.
The full XSLT specification is available at http://www.w3.org/TR/xslt.