UCS-4

Encoding Unicode in the obvious way as 4 bytes per character. To detect byte order the data is often prefixed with the character 0xfeff (ZERO WIDTH NO-BREAK SPACE), also known as the Byte Order Mark (BOM). Its byte-swapped equivalent 0xfffe of 0xfffe0000 is not a valid Unicode character, therefore it helps to unambiguously distinguish the Bigendian and Littleendian variants.

Also called UTF-32, which is exactly the same thing (except for some bogus claim that UTF-32 should not encode any characters greater than 0x10ffff).

There is no reason to use UCS-4 or UTF-16 or any encoding other than UTF-8 anywhere in any program, file, or interface. If you think there is, you should get a clue. Study "combining characters" and other parts of the Unicode standard if you are under the delusion that this will somehow make programming easier. Face it: fixed-size characters are gone, an no amount of bits will bring them back. UTF-8 has the advantage of being compatable with ASCII, which is still used for 99.5% of computer text data.

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups

wchar_t	UTF-16	combining character	UTF-32
UCS-2	UCS	Unicode	byte order mark
Bom	quine

Login
Password

UCS-4

Sign In

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups