A pseudo-random sequence of numbers (as opposed to a random sequence) has some properties to it that are not random. Randomness can be looked at in the following ways:
  • Chi Squared: This test is the test used in looking at randomness of data. It measures the deviation of a sample from the expectation. In a purely random sample of data, it would be expected that the numbers be evenly random. If you roll a fair six sided dice 60 times, you would expect to get 10 of each number. If one gets extremely high values or extremely low values on the chi-squared test it means that it is skewed somewhere.

    For the data run that I did, it was 204,800 bytes. One would expect to see approximately 800 of each valued byte. This was not the case. The lower 128 values had a mean of 999. The upper 128 had a mean of 600. This is clearly not random.

    Chi square distribution for 204800 samples is 12954.37, and randomly would exceed this value 0.01 percent of the times.

    The low 8 bits of rand() is very non-random:
    Chi square distribution for 500000 samples is 0.01, and randomly would exceed this value 99.99 percent of the times.

    A truly random source of numbers looks like (this example is from radioactive decay):
    Chi square distribution for 32768 samples is 237.05, and randomly would exceed this value 75.00 percent of the times.

  • Arithmetic mean: Take all the bytes in the file, sum them up, and divide it by the file length. Each byte can have a value between 0 and 255, thus the expected mean is '127.5'.

    As you can probably guess, the mean of this entire value is on the low side:
    Arithmetic mean value of data bytes is 111.5296 (127.5 = random).

  • Monte Carlo value for Pi: This test takes every 24 bit as an X and Y location inside a square. The distance is then calculated from the center. If it is within the radius then it is counted as a "hit". The percentage of hits can then be used to calculate pi.
    Monte Carlo value for Pi is 3.488002813 (error 11.03 percent).

    For a truly random source generated by radioactive delay, a 32768 byte file approximates:
    Monte Carlo value for Pi is 3.139648438 (error 0.06 percent).

  • Serial Correlation Coefficient: This test measures the nature of each byte depending on the previous byte. For a random sequence, this value will be close to zero. A non-random sequence such as a text file will provide a number close to '0.5'. Bitmaps will approach 1. Serial correlation coefficient is -0.052083 (totally uncorrelated = 0.0).
Running these tests for yourself can show that rand() is not a true random number generator, especially the count of each byte and seeing that while the numbers look random, they are not perfectly random, thus pseudo-random.
Source code used for the above tests follows:
#include <stdlib.h>
#include <stdio.h>

main()
{
  int i,r;

  srand(1);
  for(i = 0; i < 512000; i++)
  {
    r = rand();
    printf("%c%c%c%c",
    (r & 0xff000000) >> 24,
    (r & 0x00ff0000) >> 16,
    (r & 0x0000ff00) >> 8,
     r & 0x000000ff);

  }
}
The program to generate these values and tests:
http://www.fourmilab.ch/random/
A set of data that looks like it is random, but isn't. This is the sort of data you get from the RAND function in BASIC, and most programming languages have an equivilent. Usually the generator is started with a seed value (the BASIC function RANDOMIZE does this) from which it generates a number - this number is then used as the seed to generate the next number.

This is fine for anything that only has to look random - like games - but useless for serious work such as cryptography because the encyrption can be broken by attacking the random number generator. While there are some very advanced random generators out there, it is preferable to use a source of true random data whenever possible - no matter how high the period of a generator is, it will exhibit weaknesses that could potentially be exploited.

Log in or registerto write something here or to contact authors.