The programmer's Point of View:
Text is stored and transmitted
digitally in the form of
digitised numerical codes (typically 8-bit or 16-bit, the latter for unicode) interpreted
graphically using a
character set (a bit like a
typeface, but only defining which number means which character not how that character appears on
screen or in
print). For this purpose new characters had to be defined in addition to the
symbols of the
latin alphabet and
Hindu-Arabic Numerals and
punctuation etc. For example the
Space between words had to be considered as a character for the first time (well it may have been considered as a character in early systems of
crytography).
In the old days when the
typing was being done on
electric typewriters, a pair of characters were incorporated into the common character sets for what happened at the end of a line, namely a
Line Feed and
Carriage Return. These two actions were the scrolling of the '
paper' up one line (or on a computer the movement of the
cursor down one line), and the whooshy dinging bit where the part that did the writing moved to the left edge of the page (or where the cursor moves back to the start of the line). Even now to this very day these two characters CR (ASCII/ANSI 13) and LF (10) persist and many
text editors still write both characters at the end of a line. Here's the thing though: text is a one
dimensional information medium. There is no need for the cursor to move down a line because in reality
there is no line. The 'line' is just a
convention of text editing brought about because text has always been printed in lines stacked above one another, as opposed to in long one-line strips. The line is a human requirement so that people can more comfortably read the text and also a stylistic one so that the
writer can control where certain
paragraphs stop and new ones start. Modern concepts in information exchange define that a single break character be used. As such many text editors have dropped the Line Feed and only write a Carriage Return when you press the enter key. (Linux Unix systems went the other way and dropped the CR in favour of the LF. Read more about this sordid subject
here.)
In the same way, the double space is a stylistic element to make it easier for humans to read text. Technically only a single character need be used to denote the termination of a sentence. A
word processor might represent this character as a
full stop and a slightly longer space. However no such characater exists, (although there are many extra space characters defined in the ever growing
unicode standard) and it remains the case that people are going to continue doing this when writing for print. But from the point of view of data storage and transmission, it's wasted bytes. It's like when people who don't understand how to line up tab stops in
Word (or whatever word processor they use) use thirty spaces to get the text where they want only to find (but usually accept anyway) that the line doesn't quite line up with the one above due to the
proportional spacing of the obscure typeface they insist on using. A tab character takes up a single
byte and the information defining where a tab stop is positioned takes up only a few more (although I'm tactfully avoiding the subject of the massive
bloatiness of MS Word documents). Over the course of a document there can be a saving of quite a few bytes. (This probably sounds silly considering that we measure storage space in Gigs nowadays but I
aspire to be
old school (however that works).)
So I think there's a
schism between
hackers (in the 'does-a-lot-of-typing' sense) on this issue. There're those who see the final layout of a text as part of the artform and there're those who only believe the written
content is important. The former group are more inclined to want precise control over layout and will play with different typefaces, styles (
bold,
italic etc) and (
eek gads) perhaps even
colours. The latter will shun such things as
irrelevant novelties, and concentrate purely on the actual
information. Of course most writers will be somewhere between the two, perhaps not caring too much about the layout but wanting to use bold emphasis and italics to affect the way the text is read. Those closer to the former group are more likely to use the double sentence spacing than others.
Also I think this is one of those rather silly issues that seems to cause irrational irritation (
alliteration, yuch).
Secretaries etc are irritated by the lowly IT
tech guy who tells them that they're using the system of digital text representation wrong, and the techs are irritated by
lowly secretaries telling them how to type because they were
taught how to do it
properly.
The
HTML problem can be solved with careful application of the
one-pixel transparent GIF method. One replaces all but the last of a
string of spaces with a single <
IMG> of a single
transparent pixel (1x1 in size) with width set to the size of the gap (minus the last space) required. The invisible
GIF should be bound to the word directly preceding it using <
NOBR> and </NOBR> tags to prevent the image being wrapped to the start of the next line and seemingly indenting the text following it. This method sadly won't work on E2 though because they don't allow either the <
IMG> nor <
NOBR> tags. But feel free to use it in your own
webpages. I do. (And I, being the knowledge on all things, am to be trusted on stuff like this.
ahem.)