Psst -- they have white space
outside the Western world too,
you know! Let's slowly work our way east...
script is, odd as it may seem, actually a distant
relative of our familiar Roman letters
, and its rules of white space
and punctuation have cross-fertilized with the Greek and Roman
styles of writing described above. There are still quite a few
differences, mind you!
Arabic is a cursive script, meaning that its letters flow together,
and as in the West it uses white space to separate words. Arabic
has no case, omits weak vowels and is written from right to left.
not all Arabic letters are made equal: certain letters, such as
A ( alif ا ), are always followed by white space, even in
the middle of a word! Thus, "God is Greatest" -- allahu akbar --
is actually written
rbk a hll a
To the uninitiated this can be pretty confusing, as for example the
only difference between an initial
ل ) and an initial
is that the word may continue after an L, but not an A.
Thus, in order to differentiate a L from an A at the end of a word,
there is a special final
L with a hook at the end. The amount
of white space is also often wider between words than within words,
but especially with more decorative scripts this alone would not
White space between sentences and paragraphs, on the other hand, is
largely unknown in Classical Arabic, as best typified (and still
retained) in the Qur'an. The end result is very much like the
Medieval writing described by Cletus, except that Qur'anic
Arabic has a much wider repertoire of punctuation to
insert into the solid block of text. The Western
pilcrow (¶) is replaced with a circle-shaped marker
( ) for the end of a verse (ayah) and
another star-shaped marker ( ۞ ) for the end of a chapter
(rub el-hizb). Western commas, semicolons and dashes are
replaced by drawing little superscript Arabic letters, eg.
a meem means a pause is obligatory, jeem means
recommended but not required, saad is not recommended but
possible, etc. This system is pretty opaque without extensive
study, but it does add to the hypnotic beauty of written
Modern Arabic, on the other hand, uses slightly modified but
familiar versions of Western punctuation symbols. The period is
still eschewed in Arabic itself, with a wider stretch of white space
substituting, but Urdu (which is written with the Arabic script) uses
the period. As white space thus acquires syntactic meaning, the
preferred means of justification is to stretch the "bar"
(tatweel) connecting the characters:
بيـــــت are exactly the
One last tidbit: the mathematical three-dot "therefore" symbol ∴
is originally from Arabic, where it is yet another Qur'anic
symbol known as the muanqah and meaning that the word thus
marked "therefore" continues from the previous word.
Don't worry too much if you got a few question marks above,
most browsers can't quite handle Qur'anic Unicode yet...
Pretty much the same pattern repeats with Hebrew
, which is also
derived from the same Canaanite scripts as Arabic and Roman.
namely the Torah
, has its own system of punctuation, but modern
Hebrew is written with Western punctuation and formatting,
Unlike Arabic, Hebrew is a block script and there are no funky inter-word
white space rules.
and its many, many relatives and lookalikes like Armenian
all employ modern Western
white space and punctuation rules. Yes, this is a broad
generalization and there are many tiny variances, drop me
a note if you know of something really wacky.
, the script used to write Hindi
and many other Indian
languages, is a left-to-right joined script much like Arabic,
except that the letters in a word are always joined by a bar and
the rules for ligatures
are very, very complex. Words are
separated by white space, sentences by a character called danda
and verses with a double danda
. Modern usage often substitutes
the full set of Western punctuation.
script and its close cousins Lao
derived from Devanagari, but they do not
separate words at all!
White space is only used to separate sentences. Other Western
punctuation like the exclamation point and quotation marks are
used in modern Thai.
, on the other hand, had a complete system of writing at
the time the Egyptian
s were still doodling hieroglyph
pyramid walls. In an ideographic
writing system like Chinese
each character essentially represents one concept or "word" --
yes, this is a simplification, but it will have to do
so words are already separate from each with no need for additional white
And indeed, for a very long time Chinese was written with
no white space or punctuation to speak of: text went from top to bottom
in rows marching from right to left, leaving the sentences for the
reader to figure out,
ikok 951 海森
sefi 62 之林
tln 73 恋是
hid 84 人大
Although for short poems and lists line breaks were often inserted
at the end of each verse or item, improving readability somewhat.
This classical style is still used for things
and Buddhist sutra
s, which can thus be a royal pain
to read since the characters used and their meanings have also tended
to change over the millennia... but I digress.
Eventually, in China too the Western punctuation system crept in,
once again with a few changes. To prevent confusion with the dots
and curlies of the characters themselves, the period became a little
circle "。" and the comma shifted direction and became
a lot longer, "、".
went the Chinese route and, despite the adoption of its
own two kana
syllabaries for phonetic writing, never saw the need
to adopt white space. In a sentence like 俺が猫を食った,
"I ate the cat", the content of the sentence is in the Chinese
characters -- 俺 猫 食 -- and the kana
が を った -- sort out
their relationship. This is considerably clearer than Chinese,
where you have to rely on word order to figure out whether
a particular character is acting as a noun, adjective or verb,
and this is in fact one of the rare upsides of the otherwise
hideously convoluted Japanese writing system
After World War II and Japan's almost-wholesale embrace of
things and ways Western, the Education Ministry decided to
start writing Japanese in horizontal rows from left to right.
(This had of course been practiced earlier as well on short
texts like signs, but there had been no consensus about the right
direction!) However, while Japanese school textbooks are to
this day written Western-style, nearly all newspapers, magazines
and books retain the old top-to-bottom formatting.
One last quirk: due to the similarity of the Western quote " and
the voiced-sound indicator dakuten ゛,
Japanese uses its own quotation marks, 「like this」, instead
of the Western ones. These are not found in Chinese.
outweirds everybody with its Hangul
system of writing,
which involves packing little kana
-like phonetic signs into
square boxes. Each Hangul composed character is one syllable
consists of an optional initial
consonant, a medial
an optional final
consonant (or two). If there is no final
consonant, it can be simply omitted, but a missing initial must be
indicated by drawing a circle ᄋ, which thus acts as
white space -- "Hey! There's nothing here!".
There is much more to Hanguk than this, but this probably isn't the
right place to get into it...
Korean was formerly written in Chinese characters with Chinese
white space rules (or lack thereof). In modern Hangul space is used
to separate both words and sentences, and once again Western punctuation
is in common use.
So in all, while the majority of the world appears less than convinced
about the merits of the Roman alphabet
, nearly the entire planet
has adopted Western rules of white space
The exclamation mark
and question mark
and the comma
and quotation mark
slightly less so. Every modern script that I know of uses
white space to separate its sentences, and many -- albeit far from all --
also use it between their words.
And thanks fly out to the Unicode Consortium for
making this writeup possible.