Base64 is a method of encoding binary data into a form of pure text that can be (for example) safely e-mailed. It is intended to be the mechanism for encoding file attatchments to e-mail using MIME, and is generally thought of (coupled with MIME) as a newer alternative to uuencode. Its internal workings are described in detail in part 5.2 of RFC 1521.

Most modern mail clients will transparently encode and decode base64, so you don't really need to be aware of the fact it exists.

Base64 is a content-transfer-encoding method to represent sequences of octets using a subset of the character set, initially developed to allow binaries to slip through mailers unharmed.

Unlike uuencode or base85 encoding (used by Level 2 Postscript), the subset of characters used by base64 are represented identically by ISO 646 (the 7-bit ASCII standard) and all versions of EBCDIC. These characters (and their index, referenced below) are:

 0 A     17 R     34 i     51 z 
 1 B     18 S     35 j     52 0 
 2 C     19 T     36 k     53 1 
 3 D     20 U     37 l     54 2 
 4 E     21 V     38 m     55 3 
 5 F     22 W     39 n     56 4 
 6 G     23 X     40 o     57 5 
 7 H     24 Y     41 p     58 6 
 8 I     25 Z     42 q     59 7 
 9 J     26 a     43 r     60 8 
10 K     27 b     44 s     61 9 
11 L     28 c     45 t     62 + 
12 M     29 d     46 u     63 / 
13 N     30 e     47 v  
14 O     31 f     48 w  (pad) = 
15 P     32 g     49 x   
16 Q     33 h     50 y   
Table lifted from RFC 1521.

Note the 65th character, =, is used for padding at the end of the data.

Using 64 characters allows us to represent 6 bits per printable character. To encode our stream of data, 24 bits (3 octets) are pulled off at a time and split into 6 bit chunks (I'll use wharfinger's example from his uuencode writeup in order to illustrate the parallel):

   00101101 11010010 00100010 
Becomes
   001011 011101 001000 100010 
In decimal:
     11     29      8     34
Which (using the above table) would be represented as the characters:
      L      d      I      i     
An astute reader will wonder what happens when fewer than 24 bits are available at end of the data. Well, Astute Reader, remember that the input must be an integral number of octets, so there are only three possibilities:
  • The final quantum is made up of 3 octets (24 bits) -- no padding is necessary.
  • The final quantum is made up of 2 octets (16 bits) -- the encoded output will consist of three characters and marked with one '=' to indicate the amount of padding added. This is done as follows:
       00101101 11010010
    
    Is split into:
       001011 011101 001000
                         ^^
    
    Note that the last grouping must be padded to six bits. In base64 output, the above 16 bits would be represented by, 'Ld8='.
  • The final quantum is made up of 1 octet (8 bits) -- the encoded output will consist of two characters and marked with two '=' characters:
       00101101
    
    Becomes:
       001011 010000
                ^^^^
    
    Again, the last grouping must be padded to six bits. In base64 output, the above 8 bits would be represented by, 'LQ=='.
Note that 24 bits are represented by 32 bits in plain text (4 characters * 8 bits/character) -- this translates to approximately a 33% increase in file size.

Log in or register to write something here or to contact authors.