U+00FF SMALL LETTER Y WITH DIAERESIS or ÿ --that's ÿ in HTML -- is the oddball among the set of vowels with diaeresis for a simple reason...

It doesn't exist. To the best of my knowledge (and extensive discussions on the www-international@w3.org mailing list), there is no language on the entire planet that uses the character "ÿ" as a part of its regular orthography. Linguistically, the reason is simple: the diaeresis is often used to represent front vowels, but since the vowel "y" is already one in eg. Finnish, there is no need for a "ÿ". It is often claimed that ÿ is found in Dutch, but this is a case of mistaken identity. Dutch is a frequent user of the U+0133 LATIN SMALL IJ LIGATURE ij (ij), which the Dutch often treat as one letter and which also happens to resemble ÿ, at least if you squint and are chilling out at a coffee shop in Amsterdam. But mere resemblance does not mean identity!

Now, while ÿ is not a stand-alone character, there may occasionally be a need to plop a diaeresis atop a plain "y" so the vowels are pronounced separately. This doesn't seem to happen much outside French, in names like actress Jenna Von Oÿ, musician Eugène Ysaÿe and the surname L'Haÿ, which also pops up in a few place names, such as a suburb of Paris called L'Haÿ-sur-Seine.

But the third use of ÿ, which has nothing at all to do with human language, is perhaps the most common. As it happens, ÿ is octet 255 / hex FF and thus the last 8-bit character, which makes it a good candidate for use as a zigamorph or delimiting character for situations where null won't do. This use is even enshrined in Unicode, since the HTML 4.01 Specification decrees that documents encoded in UTF-16 should start with U+FEFF ZERO-WIDTH NON-BREAKING SPACE,aka the Byte Order Mark. Since U+FFEF is guaranteed not to exist, this is convinient for detecting byte-reversed order and as it happens these bytes translate to "þÿ" in Latin-1.

Update: Looks like Turkmen, guided by the infallible hand of the Turkmenbashi, is adopting a new script that employs ÿ as a surrogate for a "barred y" character, pronounced as a consonant y. President Saparmurad Niyazov will thus be henceforth known as Saparmyrat Nyÿazow. The same script employs a whole range of bizarre characters including ¥, $, and ¢. Given the <cough> slightly eccentric nature of Turkmenistan's dictator, it remains to be seen how widely this will be adopted... and thanks to Gritchka for this tip.

Incidentally, if you think this character is useless, take a look at its big brother Ÿ.



