It can be proven that under any lossless compression method some strings (those which are nearly completely random) zero- or negative compression must occur.

Under a lossless compression method any one valid pattern must extract to only one output pattern. Without this, the algorithm would not be unique and would therefore be lossy. There are (28)8 (1.84x1016) different combinations of eight character uncompressed strings. Assume that one of these strings is compressed from an eight character code to a seven character code, and all others remain identical. There is now not a one-to-one relationship between compressed and uncompressed codes, as one seven letter code has to stand for it's original code and the longer eight letter code. This means the compression algorithm is lossy. If the seven- and eight character codes swap position, this problem does not arise. However, the seven character code now takes eight characters to write - in one case, a longer output pattern occurs.
However, most data is not completely random. This text has a high percentage of the phrases 'compress', 'character' as well as other English language features. Only seven-bit characters are used. Compression programs can exploit these factors to reduce the size of nearly all likely combinations. However, in some cases, the file size will grow, or it will be stored uncompressed. However, storing it uncompressed requires additional metadata to say that it is uncompressed and should not be extracted. Hence file size will rise.

Example: see compression for compressing aaaaa and aaa to 5a and 3a respectively. However, how would both 5a and 3a be represented? By 151a and 131a? It may be stated that numbers will not exist in the input data, but this means that instead of increasing the size of some information, you increase the domain of the output.