in the obvious way as two bytes per character. The standard is the high byte is first (to allow string sorting to match), but due to the prevailance of small-endian
Intel processors and lazy programmers in Seattle
, this data is often low-byte first.
In order to allow the automatic detection of the byte order, it has become customary on some platforms (notably Win32) to start every
Unicode file with the character U+FEFF (ZERO WIDTH NO-BREAK SPACE), also known as the Byte-Order Mark (BOM). Its
byte-swapped equivalent U+FFFE is not a valid Unicode character, therefore it helps to unambiguously distinguish the Bigendian and
Littleendian variants of UTF-16 and UTF-32.
This is not exactly the same as UTF-16 but pretty close. UTF-16 contains bogus enhancements to make it encode more than 65536 possible characters.
I strongly recommend the use of UTF-8 for all text processing.