Psst -- they have
white space outside the Western world too,
you know! Let's slowly work our way east...
Semitic
Arabic
The
Arabic script is, odd as it may seem, actually a distant
relative of our familiar
Roman letters, and its rules of white space
and punctuation have cross-fertilized with the Greek and Roman
styles of writing described above. There are still quite a few
differences, mind you!
Arabic is a cursive script, meaning that its letters flow together,
and as in the West it uses white space to separate words. Arabic
script
has no case, omits weak vowels and is written from right to left.
However,
not all Arabic letters are made equal: certain letters, such as
A ( alif ا ), are always followed by white space, even in
the middle of a word! Thus, "God is Greatest" -- allahu akbar --
is actually written
الله
اكبر
rbk a hll a
To the uninitiated this can be pretty confusing, as for example the
only difference between an
initial L (
laam ل ) and an
initial A
is that the word may continue after an L, but not an A.
Thus, in order to differentiate a L from an A at the end of a word,
there is a special
final L with a hook at the end. The amount
of white space is also often wider between words than within words,
but especially with more decorative scripts this alone would not
be sufficient.
White space between sentences and paragraphs, on the other hand, is
largely unknown in Classical Arabic, as best typified (and still
retained) in the Qur'an. The end result is very much like the
Medieval writing described by Cletus, except that Qur'anic
Arabic has a much wider repertoire of punctuation to
insert into the solid block of text. The Western
pilcrow (¶) is replaced with a circle-shaped marker
( ) for the end of a verse (ayah) and
another star-shaped marker ( ۞ ) for the end of a chapter
(rub el-hizb). Western commas, semicolons and dashes are
replaced by drawing little superscript Arabic letters, eg.
a meem means a pause is obligatory, jeem means
recommended but not required, saad is not recommended but
possible, etc. This system is pretty opaque without extensive
study, but it does add to the hypnotic beauty of written
Qur'anic verse.
Modern Arabic, on the other hand, uses slightly modified but
familiar versions of Western punctuation symbols. The period is
still eschewed in Arabic itself, with a wider stretch of white space
substituting, but Urdu (which is written with the Arabic script) uses
the period. As white space thus acquires syntactic meaning, the
preferred means of justification is to stretch the "bar"
(tatweel) connecting the characters:
بيت
and
بيـــــت are exactly the
same word!
One last tidbit: the mathematical three-dot "therefore" symbol ∴
is originally from Arabic, where it is yet another Qur'anic
symbol known as the muanqah and meaning that the word thus
marked "therefore" continues from the previous word.
Don't worry too much if you got a few question marks above,
most browsers can't quite handle Qur'anic Unicode yet...
Hebrew
Pretty much the same pattern repeats with
Hebrew, which is also
derived from the same Canaanite scripts as Arabic and Roman.
Classical Hebrew,
namely the
Torah, has its own system of punctuation, but modern
Hebrew is written with Western punctuation and formatting,
Unlike Arabic, Hebrew is a block script and there are no funky inter-word
white space rules.
Greek
Greek and its many, many relatives and lookalikes like
Armenian,
Cyrillic,
Ethiopic,
Georgian all employ modern Western
white space and punctuation rules. Yes, this is a broad
generalization and there are many tiny variances, drop
me
a note if you know of something really wacky.
Indic
Devanagari
Devanagari, the script used to write
Hindi and many other Indian
languages, is a left-to-right joined script much like Arabic,
except that the letters in a word are always joined by a bar and
the rules for
ligatures are very, very complex. Words are
separated by white space, sentences by a character called
danda
and verses with a
double danda. Modern usage often substitutes
the full set of Western punctuation.
Thai
The
Thai script and its close cousins
Lao and
Myanmar are
derived from Devanagari, but they do
not separate words at all!
White space is only used to separate sentences. Other Western
punctuation like the exclamation point and quotation marks are
used in modern Thai.
CJK
Chinese
The
Chinese, on the other hand, had a complete system of writing at
the time the
Egyptians were still doodling
hieroglyphs on
pyramid walls. In an
ideographic writing system like Chinese
each character essentially represents one concept or "word" --
yes, this is a simplification, but it will have to do --
so words are already separate from each with no need for additional white
space.
And indeed, for a very long time Chinese was written with
no white space or punctuation to speak of: text went from top to bottom
in rows marching from right to left, leaving the sentences for the
reader to figure out,
ikok 951 海森
sefi 62 之林
tln 73 恋是
hid 84 人大
Although for short poems and lists line breaks were often inserted
at the end of each verse or item, improving readability somewhat.
This classical style is still used for things
like
poetry and
Buddhist sutras, which can thus be a royal pain
to read since the characters used and their meanings have also tended
to change over the millennia... but I digress.
Eventually, in China too the Western punctuation system crept in,
once again with a few changes. To prevent confusion with the dots
and curlies of the characters themselves, the period became a little
circle "。" and the comma shifted direction and became
a lot longer, "、".
Japanese
Japanese went the Chinese route and, despite the adoption of its
own two
kana syllabaries for phonetic writing, never saw the need
to adopt white space. In a sentence like 俺が猫を食った,
"I ate the cat", the content of the sentence is in the Chinese
characters -- 俺 猫 食 -- and the
kana syllables --
が を った -- sort out
their relationship. This is considerably clearer than Chinese,
where you have to rely on word order to figure out whether
a particular character is acting as a noun, adjective or verb,
and this is in fact one of the rare upsides of the otherwise
hideously convoluted
Japanese writing system.
After World War II and Japan's almost-wholesale embrace of
things and ways Western, the Education Ministry decided to
start writing Japanese in horizontal rows from left to right.
(This had of course been practiced earlier as well on short
texts like signs, but there had been no consensus about the right
direction!) However, while Japanese school textbooks are to
this day written Western-style, nearly all newspapers, magazines
and books retain the old top-to-bottom formatting.
One last quirk: due to the similarity of the Western quote " and
the voiced-sound indicator dakuten ゛,
Japanese uses its own quotation marks, 「like this」, instead
of the Western ones. These are not found in Chinese.
Korean
And
Korean outweirds everybody with its
Hangul system of writing,
which involves packing little
kana-like phonetic signs into
square boxes. Each Hangul composed character is one
syllable and
consists of an optional
initial consonant, a
medial vowel and
an optional
final consonant (or two). If there is no final
consonant, it can be simply omitted, but a missing initial must be
indicated by drawing a circle ᄋ, which thus acts as
visible white space -- "Hey! There's nothing here!".
There is much more to Hanguk than this, but this probably isn't the
right place to get into it...
Korean was formerly written in Chinese characters with Chinese
white space rules (or lack thereof). In modern Hangul space is used
to separate both words and sentences, and once again Western punctuation
is in common use.
Summary
So in all, while the majority of the world appears less than convinced
about the merits of the Roman
alphabet, nearly the entire planet
has adopted Western rules of
white space and
punctuation.
The
exclamation mark and
question mark are effectively
universal and the
comma,
period and
quotation mark are only
slightly less so. Every modern script that I know of uses
white space to separate its sentences, and many -- albeit far from all --
also use it between their words.
And thanks fly out to the Unicode Consortium for
making this writeup possible.