A line ending malfeature from that obnoxious monster DOS, and aptly referred to as a bug by kaatunut.

DOS line terminators are in ASCII hexadecimal represented as 0D 0A, or symbolically as CR LF (carriage return line feed). Every line break is represented by these two characters. This is probably because old line printers required these two characters to terminate a line--one to "return the carriage" (like old typewriters), and one to "feed a line."

This has leached over to the whole Microsoft series of operating systems.

Sane operating systems like UNIX use a single LF (0A) character to end lines. The discrepancy leads to much kludging and portability agony. For instance, in C (the language), "\n" is an escape sequence for newline in strings. This maps to an ASCII LF or 0A--except in DOS it actually has to expand to CR LF. Which is bad because \n also has to serve as a character, and it can break file format compatibility.

Then there are the Macs, with their single CR (0D) line terminators... The moral of the story is you have to jump through many portability hoops to reach beyond the *NIX world.

Note: Many internet RFC standards have consequently declared the DOS-style CR LF as the standard for line terminators, but are required to accept all styles.

Ahhum. Let me clarify:

The 'bug' I referred to demonstrates itself in C compilers that aren't prepared to Microsoft stupidity. You see, normally, when you want to continue a literal (#define, "") over a linefeed, you escape it, because normally linefeed terminates #defines and gives syntax error inside "". So, you put '\' in the end of the line. Now, when compiler reads it, to it the bytes are: '\' 0x0a, that is, backlash and '\n'. Since it's possible to escape so that backlash-anycharacter will do something interesting, backlash-linefeed was defined as "nothing" so as to make this line continuing easy.

Now, in the light of above writeup... what do you get when you escape? Right. You get "backlash-0x0d-0x0a" (or '\' '\r' '\n'). Now the compiler will escape the \r, leaving \n there. And since DOS idiots just had to force the linefeeds work that way so there is no way to insert another \ between \r and \n ... bye-bye line continuing.

I don't know, but I'd guess, that most DOS compilers work around this in an undstandard way that really makes three-character escape of '\' '\r' '\n' which surely broke something and added to the bloat, but decent unix compilers like cygwin port of gcc do just what is logical; complain about syntax error as if you had not put that '\' there.

...

Now, as for the '0x0d 0x0a' behaviour itself being a bug... no comments. This whole DoS thing is a boot sector virus, so why not...

This isn't really DOS' or Microsoft's fault. IBM should probably be blamed. The problem originates with the video BIOS. (which, unless I'm mistaken, was written by IBM)

The standard method for writing text to the screen via BIOS is to use interrupt 0x10, subfunction 0x0e. This is the "teletype" command, it writes a character to the screen and updates the character position.

This function interprets "line feed" as a line feed; it drops to the next line without changing the cursor's column. It interprets "carriage return" as carriage return; it returns the cursor to the first column of the current row.

Now it's possible to just write directly to the video memory if the OS keeps track of the cursor position itself, and it's certainly possible for the OS to send 0x0d 0x0a to the video card whenever 0x0a is printed, but the simple thing to do is just pass strings on to BIOS one character at a time and let it worry about everything.

So this 'bug' isn't Microsoft's fault, they just didn't fix it.

...then by the time you get to Windows, Microsoft was already trapped by backwards compatibility.

Using CRLF as the line separator is not DOS-specific. Up until Unix, this was the standard for ASCII-using systems that treated text files as strings of bytes (as opposed to, say, record-based systems). For most ARPANET and Internet protocols, the line separator is and always has been the two-character CR-LF sequence. When Unix and C were introduced, they used a single character for newline, for simplicity.

When the IBM PC and MS-DOS were introduced in the very early 80s, C and Unix were not yet the industry-dominating forces they would later become. It only made sense to have text files compatible with the major commercial operating systems. No one was writing programs for the PC in C: it was usually assembly, BASIC, or Pascal.

When C started being ported to non-Unix OSes, the incompatibility had to be bridged somehow. Thus C compilers for DOS and other OSes have two modes for files: binary and text. In text mode, "\n" was converted to "\r\n" before being written, and "\r\n" was converted to "\n" when read; thus a single byte could represent the newline in memory, yet the OS standard could be followed on-disk (and for terminal I/O).

There is no sense blaming either of Microsoft or IBM for the mistakes of others. They have made enough on their own.

Log in or registerto write something here or to contact authors.