XHTML - Everything2.com

What is XHTML, some of you may have heard the abbreviation and guess it has something to do with HTML and XML. That'd be a good guess. Below I hope to explain a little about what XHTML is, where it's come from, why it's a good thing, and how to use it when designing your web mark-up.

But first, a quick definition care of the W3C:

XHTML 1.0 is the W3C's first Recommendation for XHTML, following on from earlier work on HTML 4.01, HTML 4.0, HTML 3.2 and HTML 2.0. With a wealth of features, XHTML 1.0 is a reformulation of HTML 4.01 in XML, and combines the strength of HTML 4 with the power of XML

Case Of The X

HTML 4 as been around for a while, it is the basis of the modern web, however it is far from perfect and far from supported. Most people know about browser incompatibilities, but what most people do not realise is that HTML has a standard grammar; it is just that browsers do not comply.

Actually it is not the browser authors fault. To explain this we need first to look at a little history.

HTML History Lesson

SGML (Standard Generalized Markup Language) is a grammar used to mark-up documents for formatting, it was invented in the 1980's as a standard way of defining formats on electronic documents.

HTML is an SGML application. That is, it is a set of mark-up that conforms to the SGML grammar and tells people (or computers) how to format a document. SGML says an element must look like this, an attribute like this, etc.

As HTML evolved and the web became more popular, the SGML roots of HTML started to slip and browsers started allowing HTML authors get away with not conforming to the HTML standard. This was good for developers as it gives them an easier ride, however over time it adds inconsistencies between browsers and leaves us in the current mess of browser conformity.

How This Relates To XML

You've probably all heard of XML. It is a grammar like SGML, but it has been specifically designed for the Internet. It too defines how to structure documents in a rigid way, which makes it easy for programs (called XML parsers) to interpret XML documents. XHTML is an implementation of HTML as an XML application, in similar ways that HTML is an SGML application.

The ultimate goal of XHTML is to allow browsers to do away with their HTML parsers and basically become XML (and XSL and CSS) parsers. This has two benefits:

Browsers based on an XML parser can parse and display any XML document which it has a stylesheet for, whether it be XHTML, WML, or any other XML application (even one you invented yourself, thus eXtensible).

It removes incompatibilities, as the formatting of an XML document is defined in a stylesheet and an XML document is strongly typed (an illegal XML document will generate an XML parser error), so all clients will display documents the same.

But Browsers Only Support HTML 4

Wrong! XHTML is an implementation of HTML 4 in XML, thus it is HTML 4 compatible, it will be displayed fine in browsers which support HTML 4. All that XHTML really does is:

Cut out the crap
Make HTML a valid XML document
Force authors to create valid HTML documents

So XHTML Is The Standard?

Yes, XHTML has been a W3C recommendation for over 2 years (since Jan 2000 to be exact). You can (and should) use it right now.

"But why should I?" I hear you cry:

Do you hate browser incompatibility? Then use XHTML.
Are you fed up of having to check your HTML in hundreds of browsers? Then use XHTML.
Do you want to help push for a standardised XML based web? Then use XHTML.
Do you want your documents to be future proof? The use XHTML.
Do you take pride in your HTML? Then use XHTML.

You probably think that you can write proper HTML 4, I did, but if you haven't read and understood the HTML 4 spec then you can guarantee that your documents are not standard, try running your HTML through the HTML validator, you will be surprised, I know I was.

So How Do I Write XHTML?

The only way to really know is to read the XHTML spec, however it isn't the easiest document in the World to read, so here are the basic rules:

Documents must be sent with the correct content type.

Web servers send HTML files with a mimetype definition of text/html, this tells the browser what type of document to expect from a given request. XHTML is not HTML and should not be sent with the same mimetype, XHTML should be sent with the mimetype of application/xhtml+xml.

However it's not quite that simple as Internet Explorer does not understand the application/xhtml+xml mimetype and interprets it as text/xml displaying your XHTML file as an XML file. Luckily the W3C allow us to send HTML conforming XHTML documents with the text/html mimetype which IE will understand and render our document correctly.

Best practice is to get your server to check the types of files the browser will except and send the content as application/xhtml+xml to browsers that support it, and as text/html to those that don't.

Documents must start with a DOCTYPE definition.

This rule exists for HTML 4 too, but browsers let you get away with missing it out, so most people do. The first line of any XHTML document should look like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/ DTD/xhtml1-transitional.dtd">

This defines the document as XHTML 1 and tells the browser which DTD to use. There are three DTDs for XHTML, Transitional is the best to use if you are moving from HTML 4 and want maximum backwards compatibility.

All tags must be lower case.

Yes, I don't care if you've used upper case tags before, XML is case sensitive so is different to , the XHTML DTDs define all tags in lower case only, so for your tags to work they must be lower case.

All attributes must be in quotes.

That is:

<a href=http://www. example.com>Example</a>

is not valid XHTML. You must put quotes around the attribute body such as:

<a href="http://www. example.com">Example</a>

All tags must be closed.

That is if you open a tag like then you most also close it with , makes sense really.

Empty tags must be closed.

This is probably the most alien concept from HTML 4. A tag is empty if it does not contain text between itself and a closing tag, however in a valid XML document when you open a tag you must also close it. So:

is not valid XHTML. You must close the tag like this:

This is however annoying, so XML defines a shorthand way of closing empty tags like this:

Even the humble line break tag must be closed, ie:

You must nest tags correctly.

So you cannot open two tags, and then close them in a different order, eg:

This is bold and italic

is not valid, you must nest them correctly like:

This is bold and italic

Tag context

Last but not least and the hardest concept to get hold of, you must use tags in the correct context. When you place a tag in your document, it must have a parent that is valid for that tag. This is almost impossible to get right without practice or learning the XHTML spec inside out, so the best way to check if you have done something wrong is to check with the W3C validator.

Quick Review

XHTML is HTML 4 compatible
Don't forget the DOCTYPE
Don't forget to close all tags
Use W3Cs validator to check your XHTML (or if I've failed to convert you at least your HTML).

Now go forth and prepare your web sites for XML.

XHTML 1.1

XHTML 1.1 is the next version of the W3C XHTML specification for marking up Web pages, it became a W3C Recommendation in May, 2001.

XHTML1.1 is a minor progression on from XHTML1.0. Essentially it is XHTML1.0 Strict with the deprecated elements removed. It should be sent with the application/xhtml+xml mime type but may be sent as text/html to non-supporting browsers that will interpret it as HTML4.

XHTML1.1 should start with an XML declaration including an encoding type (especially if not UTF-8) followed by the XHTML1.1 DOCTYPE definition and then the "html" element including an XML namespaced language attribute.

Here is an example of a very simple XHTML1.1 document:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" >
  <head>
    <title>XHTML1.1@Everything2</title>
  </head>
  <body>
    <p>Moved to <a href="http://www.everything2.com/">www.everything2.com</a>.</p>
  </body>
</html>

The goals of XHTML1.1 are to tidy up the last remaining messy ends from HTML4, but really doesn't add or change anything for the document author.

XHTML Links & References

http://www.w3.org/TR/xhtml1/
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.zvon.org/xxl/xhtmlReference/Output/
http://www.xhtml.org/
http://www.wdvl.com/Authoring/Languages/XML/XHTML/
http://www.oreillynet.com/pub/a/network/2000/04/28/feature/xhtml_rev.html
http://hotwired.lycos.com/webmonkey/00/50/index2a.html

XHTML Strict	XML	HTML tags	Ten Standard Firefighting Orders
HTML Tidy	Strict HTML	Invalid HTML using "h1" in "ol" or "ul" tags	W3C
CSS	SGML	Lunix	DHTML
doctype	XSL	DTD	HTML
Grammar as a tool, not a rule	blockquote	Invalid HTML using "li" without "ol" or "ul" tags	ActiveX Data Objects
E2 HTML tags	XSLT	Gobbledygook	PHP