In [cryptography], an "[all-or-nothing] [transform]" (also called a "package transform") is a [randomized] unkeyed [reversible] [transformation] (P'i = f( Pi )) with the following properties:
• f( Pi ) is easy to calculate;
• f-1( P'i ) is easy to calculate if all bits of P'i are known;
• f-1( P'i ) is difficult to calculate (or even estimate/approximate) if not every bit of P'i is known.
Notice that f() is an unkeyed transformation, which means that (although it can use a [block cipher] in its construction) it is not "[encryption]" [per se]. It is, nevertheless, usually used as a [pre-processing] step prior to the use of a keyed [encryption] step.

Why?

[Cryptanalysis] often exploits [known plaintext] or [redundancy|redundancies] in the [plaintext] (i.e. statistical structure) to deduce information about a cipher's [keystream] or key, in order to break this [cipher]. Applying an [all-or-nothing transform] before encryption effectively randomizes the data (at the cost of a small [expansion] in the message size and some computational [overhead]), spreading the information you need to recover the original message over all the message. This generally turns "[partial] information about the plaintext" into "no information about the plaintext".

An example would be: imagine someone encrypts two different plaintexts with a [stream cipher] using exactly the same key (or key+[Initialization vector|IV] pair). Classically, an attacker can just [XOR] together the two [ciphertext|ciphertexts], effectively removing the [keystream] and obtaining the two plaintexts [XOR]ed together; this enables the attacker to eventually obtain the two separate plaintexts, due to assumptions that can be made regarding the statistical structure of the plaintext (e.g. it mostly consists of [whitespace|spaces] and letters). On the other hand, if someone first applies an [all-or-nothing transform] to the two plaintexts before encryption with the stream cipher (using again, the same key), the attacker can no longer recover any of the plaintexts, as it's not possible to invert two all-or-nothing transforms if you only have the two transformed plaintexts XORed together (since they are effectively random). This means that an [all-or-nothing transform] increases the [resistance] of a cipher against [statistics|statistical] and [known plaintext attack|known plaintext attacks]: even a cipher with [certificational weakness|certificational weaknesses] can be considered pretty safe, if you only use it to encrypt data that has been processed with an all-or-nothing transform.

It should be noted, for example, that some [public key cryptography] protocols are only secure under the assumption that you only encrypt random data (e.g. a randomly-generated [AES] [session key]) with it, as direct encryption of a (redundant/non-random) plaintext might leak information about the [private key]. In these types of contexts, it is also useful to apply an all-or-nothing transform prior to the encryption (like [optimal asymmetric encryption padding|OAEP], which is usually used with [RSA]), to prevent leaking information about the private key.

A final example of where an all-or-nothing transform would be useful is a situation where you want to encrypt a plaintext with two different [stream cipher|stream ciphers] (Cipher1 and Cipher2), using two distinct keys:

Cipher1( Cipher2( plaintext ) ) = (plaintext [XOR|⊕] Keystream2) [XOR|⊕] Keystream1 = (plaintext [XOR|⊕] Keystream1) [XOR|⊕] Keystream2 = plaintext [XOR|⊕] (Keystream1 [XOR|⊕] Keystream2)

As you can see, the problem is that, since [XOR] is [Commutativity|commutative], the attacker doesn't have to decrypt the ciphertext by the same order you encrypted it. In fact, the attacker doesn't even have to attack the two ciphers: (s)he can attack an equivalent cipher that outputs Keystream1 [XOR|⊕] Keystream2 as its [keystream]. On the other hand, if you apply an all-or-nothing transform between the two encryptions, they are no longer [Commutativity|commutative], and the attacker now has to "peel off" Cipher1 before he can undo the all-or-nothing transform and "peel off" Cipher2.

How?

An example of an all-or-nothing transform (using [AES]-256 as block cipher and [SHA-256] as hash function) would be something like:
1. Take your message and apply an appropriate [padding] scheme ([PKCS7] is cool, but anything decent is ok), so that its length is an [integer] [multiple] of 256 [bit|bits];

2. Choose a [random] 256-bit key (K) and encrypt each of the N 256-bit blocks (Pi) with that key using a slightly modified [ECB] mode:

P'i = AES-256-encrypt( K, Pi [XOR|⊕] i )   (for 0 ≤ iN-1)

Here, the [counter] prevents equal blocks from encrypting to the same thing, by making the output of the encryption function depend on block position. Encrypting the message with a random key effectively [whitening|whitens] (i.e. randomizes) the message. Note that this would work probably just as well with any other encryption mode, such as plain [ECB] or [CBC].

3. Now, perform the following calculations:

H0 = SHA-256( P'0 )
Hi = SHA-256( P'i || Hi-1 || i )   (for 1 ≤ iN-1)
###### (note: || means [concatenation])
4. Then calculate this single 256-bit word:

J = K [XOR|⊕] H0 [XOR|⊕] H1 [XOR|⊕] H2 [XOR|⊕] ... [XOR|⊕] HN-1

5. And, finally, the "packaged" message is just the randomly encrypted blocks [concatenation|concatenated] with J:

P'0 || P'1 || P'2 || ... || P'N-1 || J

Notice that each possible [plaintext] maps to [absurdly large number|2256] different (expanded) messages, depending on the key you choose. Also, note that the message is effectively randomized, as every element of P'i has been encrypted under a [random] key and element J results from the [XOR] of a randomly chosen number (K) with several hashes obtained from the (randomized) P'i blocks. By "randomized", I mean that, even if someone knows P0 (or any other block) with 100% [certainty], they still cannot [predict] P'0, J or any P'i. Also, even if they know P'0, they cannot recover P0 unless they know J and every other element of P'i.

To "unpackage" an [incoming] transformed message (Zi, with a length of M blocks), the recipient only has to perform the following steps:
1. Calculate the H [array]:

H0 = SHA-256( Z0 )
Hi = SHA-256( Zi || Hi-1 || i )   (for 1 ≤ iM-2)

2. Calculate K from J (i.e. the last block in Zi):

J = ZM-1
K = J [XOR|⊕] H0 [XOR|⊕] H1 [XOR|⊕] H2 [XOR|⊕] ... [XOR|⊕] HN-1

3. Now just decrypt the blocks using K, to get the original plaintext:

Pi = AES-256-decrypt( K, Zi ) [XOR|⊕] i   (for 0 ≤ iM-2)

As you can see, as long as someone knows all the bits of the "package", it's trivial to [reverse] this transform. On the other hand, to show how this transformation is "all-or-nothing", let's see what happens when only [partial] information is available:
• Imagine you encrypt the first 256-bits of Zi (corresponding to P'0) with some key (L): someone who doesn't know L cannot calculate or estimate H0 (or any of the elements of Hi, actually, due to [chaining]), so (s)he cannot obtain the random key (K) used for the all-or-nothing transform. Also, note that, since only one block was encrypted with L and that block was randomized to begin with, mounting any type of [cryptanalysis|cryptanalytic attack] to recover L becomes [computationally infeasible|virtually impossible] (you can't really attack a cipher after having looked at only one output);

• Imagine you encrypt the last 256-bits of Zi (corresponding to J) with some key (L): some attacker can now calculate the whole H array, but (for the same reason as above) it's still [rather difficult|impossible] to obtain K, since you need to know J. Again, given that you're only encrypting one block with L and that the block is randomized to begin with, it becomes [computationally infeasible|virtually impossible] to mount a cryptanalytic attack to recover L;
This extends to any block: if you [encryption|encrypt] or [obfuscation|obfuscate] any single block of the "package", it becomes [computationally infeasible] to recover K and undo the all-or-nothing transform. Of course, in a real application (unless there are specific [constraints]), you would probably apply a keyed encryption step to the whole expanded message (rather than to a single block), effectively increasing the [robust|robustness] of the keyed encryption step against cryptanalysis (particularly against "[known plaintext attacks]").

Finally, it should be said that, from a purely [academia|academic] point-of-view, a type of transform like this is not very [efficiency|efficient], as it requires an additional encryption and [hash function|hashing] step per block and results in an expanded message. On the other hand, it's also true that the expansion rate is small for large ciphertexts (and tends to 0% as the ciphertext size grows to [infinity]), so it's [negligible] when you're encrypting [bulk] data and computational [overhead] is mostly negligible for small ciphertexts (particularly in a world where people use stuff like [scrypt] and [bcrypt]). Besides, if it's data you're not likely to decrypt every day, spending a handful of seconds more while decrypting is not the end of the world.

So... yeah... if you're thinking of encrypting data you're not going to touch for a while, the correct order of steps should be:
1. [compression];
2. [all-or-nothing transform];
3. [encryption];
4. [forward error correction].

Just saying...

References
• [Ron Rivest|Ronald L. Rivest] (1997). "[http://theory.lcs.mit.edu/~cis/pubs/rivest/fusion.ps|All-or-nothing encryption and the package transform]". Fast Software Encryption proceedings, 1267: 210–218.
• Victor Boyko (1999). "[http://theory.lcs.mit.edu/~cis/pubs/boyko/aont-oaep.ps.gz|On the Security Properties of OAEP as an All-or-nothing Transform]". Crypto '99 proceedings, 1666: 503–518.
Existing:

Non-Existing: