If you use more than one of Unix, Windows and a Macintosh, you may have come across the "newline characters" problem: they all use different codes for the end of a line. These codes are combinations of the ^M and ^J characters (that is, Ctrl-M and Ctrl-J).

Unix      ^J
Windows   ^M^J
Mac       ^M

These characters are also known as "carriage return" and "line feed" or by their octal ASCII codes:

\015   Ctrl-M   Carriage return
\012   Ctrl-J   Line feed

When you have a file which uses the wrong newline convention, what can you do? Transferring it using ftp in ASCII mode should sort it out. Or if you use Emacs 20, you can edit a file which uses any of these conventions, and you can tell which one it uses from the symbol near the bottom right-hand corner:

:   Unix
\   Windows
/   Mac

Finally, here's how to use the Unix tr command to convert Windows or Mac text files to Unix text files:

Windows -> Unix    tr -d '\015' < windowsfile > unixfile
Mac -> Unix        tr '\015' '\012' < macfile > unixfile

Unicode further complicates the issue of newlines. In Unicode, any legacy newline can also be a newline in Unicode text. That is, the Carriage Return (CR, U+000D), Line Feed (LF, U+000A), CRLF, or the EBCDIC convention of Next Line (NEL, U+0085) can all be newlines in Unicode.

To further muddy the waters, Unicode adds two more characters - Line Seperator (LS, U+2028) and Paragraph Seperator (PS, U+2029). Line Seperator signals the end of a line, and is equivalent to <br> in HTML. Paragraph Seperator signals the end of a paragraph and is equivalent to <p> in HTML. See Unicode Standard Annex (= Technical Report) #13 at www.unicode.org for more details.

Log in or register to write something here or to contact authors.