Massively useful, and clearly the right thing for dealing with those messy foreign languages without resorting to the unspeakable ugliness of 8-bit ASCII, that which forces you to choose between having curly brackets or accented vowels, and therefore screws royally Italian C programmers (and French C programmers and ...).

Unfortunately, no web browser implements the full set, and all I can do is say things like † and hope that the right character (actually glyph: you never see characters, only glyphs) appear on my screen.

What your browser thinks about &dagger;: ---><--- (on Internet Explorer 6.0, with a full moon and my nose perfectly clean, I get a little Christian cross).
I dunno what weird character encoding buffo uses, but curly brackets are in the 7-bit ASCII set, and any sane 8-bit ASCII extension (including the widely-used ISO 8859 standard) won't touch them.

Still, HTML entities are a good way to display non-english characters without worrying about how to tell the browser which encoding to use, and they're the best way to use characters from totally unrelated languages in the same document (like using some kanji in a German website about the Japanese language).

The full Unicode character set can be accessed as HTML entities through the pattern &#x0000;, substituting the unicode number in hexadecimal for the zeroes.

However, there are some negative points, too. Editing the source code is incredibly painful if there are a lot of HTML entities in them and your editor doesn't automatically translate them both ways (which I don't think is possible while retaining full flexibility). And the entities won't show up in a string search for the "normally" encoded characters - your website on "&Ouml;sterreich" won't be found by someone looking for information about "Österreich" in a search engine. The same is true for Everything 2 node titles.

HTML entities are necessary in attributes! Otherwise it is not valid HTML. This violation most occurs as ampersands in HREF URLs; here are some reasons why you should avoid it:

  • Appendix Item B.2.2 of the W3C's HTML 4.0 specification explicitly tells you to.
  • Section 2.2 of RFC 1738: Uniform Resource Locators tells you to.
  • Section 4.4 of the W3C's XML 1.0 specification tells you that not using entities in an attribute value is "Forbidden".
  • Any programs that parse using a validating parser will choke on unescaped character entities.
  • XML makes it clear that there is very little difference between attribute values and character data in a tag. Therefore it should be possible to move CDATA from one to the other without additional parsing.
  • According to a discussion on the W3C's HTML mailing list in February of 1998, the only known browser that doesn't parse HTML entities in attribute values correctly is Amaya 1.1c Beta.

Please, for the sake of the children, use HTML entities in your attribute values.

Log in or register to write something here or to contact authors.