The purpose of a DTD (Document Type Definition) is to provide a parser with the necessary rules to confirm a particular document is valid.

SGML has a rich and complex DTD syntax. XML has a much simpler DTD syntax. However, this is likely to be replaced by XML schema, which use the same syntax as XML itself. There are a number of syntax elements in the DTD, as follows.


An XML DTD is referenced from an XML document using a "!DOCTYPE" tag. This takes one of two forms:
  1. Internal
    <!DOCTYPE dtd-name [ ...declarations... ]>
  2. External
    <!DOCTYPE dtd-name SYSTEM "filename">
Notice that this doesn't follow XML syntax rules.


Each element used in the XML grammar described by this DTD is defined in an "!ELEMENT" tag. This has the following format:
<!ELEMENT element-name (content-model)>

The content-model describes the content of this element. The syntax elements are:

A content-model definition enclosed in brackets "()" can be treated as syntactically-equivalent to an element.
A sequence of elements separated by commas must appear in the indicated order.
If a number of elements are separated by "|", any one (but only one) of them may appear.
If an element is suffixed by "*", it may occur any number (zero or more) times.
If an element is suffixed by "+", it must occur once or more than once.
If an element is suffixed by "?", it must occur either zero times or once.

A number of special, predefined content-models exist that have special meanings:

This indicates that there must be no content for this element.
This indicates that any valid element may form the content for this element.
This indicates that text may form the content of this element.


XML entities can have attributes. These must be defined in the DTD using the "!ATTRIBUTE" tag. This has the following format:
<!ATTRIBUTE element-name attribute-name attribute-type>
<!ATTRIBUTE element-name attribute-name attribute-type keyword>
<!ATTRIBUTE element-name attribute-name attribute-type default>

Multiple attributes may be specified by repeating the the attribute-name... syntax as many times as is required.

The following values are valid for attribute-type:

The value can be any character data.
The value must be a unique identifier.
The value must be an existing identifier - i.e. this is a reference to an entity with the matching value in an ID attribute.
The value may contain only letters, digits and hyphens - i.e. valid characters for constructing names or tokens.
The value must be a valid entity.
enumerated values
A bracketed, |-separated list of valid values.
The value may be any of the listed of values, which are separated by "|".

The following values are valid for the optional keyword:

This attribute must be specified on this entity.
This attribute may, optionally, be specified on this entity. If omitted, the reader will supply their own value.
#FIXED value
This attribute of this entity is always the value stated.

Finally, for an optional default value may be supplied. This is mutually exclusive with keyword.


The XML DTD syntax also allows for "entities" to be defined. Essentially, these represent textual substitutions at one or other level. They exist in two forms: those that are substituted in the document (such as &amp; in HTML) and those that can be referenced elsewhere in the DTD itself. They are defined using the "!ENTITY" tag:
<!ENTITY entity-name entity-def>
is the name that will be expanded (e.g. "amp"). For use in the DTD, the name is preceeded by a "%" and a space.
is the value that will replace the name. This can either be supplied directly or, if preceeded by the keyword "SYSTEM", by reference to a URL.

I was going to supply a DTD describing the XML DTD grammar. However, I don't believe this is possible given what I've described above. Instead, here's a DTD for holloway's customer record file:
<!ELEMENT customer-file customer-details*>
<!ELEMENT customer-details name, address>
<!ELEMENT address street, city, state, postal?>
<!ATTRIBUTE customer-details id ID #REQUIRED>
<!ATTRIBUTE address country CDATA "US">
I've decided:

Of course, other DTDs could be written against which the example would be valid.

A tutorial lives here: - there's also some references. The W3C definitions can be found here:
For years surrounding the release Microsoft's Office 2000 the company was applauded in reviews that said Microsoft had changed their spots and were now supporting an open format... XML!

(After all, it was the story MS had spun and reviewers are inherently lazy creatures)

XML is a method for putting structured data into a file. Within this you choose a DTD (Document Type Definition) that defines the rules for holding the specific data you wish to store. The DTD is the unique subset of XML.

If a DTD isn't published it's no more open than a binary file. Although simple examples are quickly disected and analysed - a programmer has great difficulty knowing that when you bold some text it should be written into the file as a <important> rather than a <bold>. If the rules for saving XML structured information are not published and defined, the XML DTD is still a closed standard... despite being XML.

If I were a paranoid man who slept with the door locked, moat full to the brim, then I might claim that MS noticed the XML buzzword hype and wanted to cash in on the goodwill associated with the "open" meta-language. Get in first and spread the unpublished Word2000 DTD as THE STANDARDTM for text documents.

Log in or registerto write something here or to contact authors.