XML entities look rather like HTML entities, however as you'd expect, unlike HTML, you can write your own.
There are three types of XML entity; Internal, External, and Parameter. I will only cover the first two in this write-up and leave parameter entities for another time since they are used in DTDs and not in XML files themselves.
Internal XML entities are the plain vanilla style of entity. They are used rather like a string constant in a programming language. You define them within your DTD setting their value to a string of characters, then use them in your XML document to output that string in their place.
For example, if you often write the name of your company, "I Am A Monkey Incorporated", in your XML files (i.e. in a header or footer section), you can define this name as an internal entity making it easy to update throughout all your documents that use the same DTD. Since the XML parser expands the entity, you can be sure that when your company is bought out and changes its name, the new name, "We Eat Bananas Limited", will appear in all your documents. Neat.
To define such an entity in your DTD, you use the following syntax:
<!ENTITY company "I Am A Monkey Incorporated">
To use an entity you insert an entity reference
into your document. An entity reference is an ampersand
(&), followed by the name of the entity, followed by a semicolon
So continuing our example:
would be parsed out to:
I Am A Monkey Incorporated
You're probably already familiar with entity references from HTML
or trying to write a '<' in E2
The text that is inserted by an entity reference is called the "replacement text". The replacement text of an internal entity can contain markup (elements, attributes, CDATA, processing instructions, other entity references, etc.), but any element that you start in an entity must end in the same entity and recursive entity references are not allowed (or you'll be in big trouble).
There are five internal entities predefined in XML, you'll recognize them from HTML, however the other HTML entities are not predefined and if you want to use them you'll have to define them yourself:
< - The less than sign (<)
> - The greater than sign (>)
& - The ampersand (&)
' - The single quote ( ' )
" - The double quote ( " )
All XML processors are required to support references to these entities, even if they are not declared, which is useful.
Character references, which are similar in appearance to entity references, allow you to reference arbitrary Unicode characters, even if they aren't available directly on your keyboard. Character references are not actually entities at all, they just share the syntax as a convenient way to access Unicode characters.
The basic format of a character reference is either "&#nnn;" or "&#xhhh;" where "nnn" is a decimal Unicode character number and "hhh" is a hexadecimal Unicode character number.
A character reference inserts the specified Unicode character directly into your document. Note that this does not guarantee that your processing or display system will be able to do anything useful with the character. For example, ⍮ would insert, in the words of the Unicode standard, an "APL Functional Symbol Semicolon Underbar". Whether or not you can print that character is an entirely different issue.
Character references differ from other entity references in a subtle but significant way. The parser expands them immediately. Using '"' is exactly the same as '"'. In particular, this means you can't use the character reference in an attribute value to escape the quotation characters, thus the existence of the predefined entities mentioned above.
External entities offer a way for dividing your document up into logical chunks. So instead of having a huge document, you can break it up into a number of smaller XML documents (one per chapter?) and include them into the main document using entities.
Because external entities in different documents can refer to the same files on your file system, external entities provide an opportunity to implement reuse, and we all like reuse.
So say we have a number of chapters in separate XML files and we want to write one XML document that contains all the chapters. We can define a number of entities as follows and use them in our "big fat" document:
<!ENTITY chapter1 SYSTEM "chapter1.xml">
This lets us put:
into our document and include the entire contents of chapter1.xml. Quite useful
, especially if we also want to read each chapter individually, or have a document that just includes the first 3 chapters (or whatever).
Another Special XML Character
Obviously, the existence of entities introduces another special XML character to join the < and > characters reserved for delimiting tags. The ampersand (&) character will always be read as the beginning of an entity. So if you just want an ampersand, you have to remember to use the predefined entity &