A regex that can be used in perl to test for primality. The full code is

perl -e 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' xxxx
where xxxx is the number who's primality you wish to test. This one liner first appeared on comp.lang.perl.misc, posted by Abigail.

Now if you're not a perl monk (and I'm certainly not), you're probably quite confused as to how on earth this works. Seeing a few perl one-liners has probably convinced some people that perl scripts are written by monkeys running along people's keyboard. This particular one isn't that bad however.

The first part, perl -e is just the incantation necessary to get perl to run a script passed to it on the command line. The next part is equally simple, it says print 'PRIME' if a certain condition is met. The magic starts here.

In perl, shift takes the first value off an array and returns it. If shift is not given an argument, it operates on the array @_, which is the array of arguments passed to the current function, so in this case, shift will return the parameter we passed, i.e. the number who's primality we wish to test.

The binary x operator, returns the string obtained by taking its left argument and repeating it the number of times specified by the right argument. So (1 x shift) evaluates to a string of ones, of length the number passed as an argument.

!~ /^(11+)\1+$/ is the heart of the script. !~ means that the result of this is true if what is on the left (our long string of ones) does not match the regex that follows. Lets break down the regex.

/^(11+)\1+$/

The 2 slashes are just there to mark the beginning and end of the regular expression. ^ and $ have special meanings, respectively the start and the end of the string being matched. This means for a string to match this pattern, the entire string must match the pattern (11+)\1+.

The pattern 1 is just the character 1. The + means "one or more times". So 11+ is 2 or more 1's

The brackets around the 11+ create a capture buffer. \1 refers to the 1st capture buffer, i.e. (11+) (it's known as a backreference). And lastly, the + means that we want \1 to be repeated one or more times.

Why this works should now become apparent. If a string matches this pattern, it means that we can take a string of 2 or more characters, repeat it 2 or more times and get the whole string. What we have done is found a divisor of the length of the string, i.e. a divisor of the number passed to the perl script. And so if the string does not match this pattern, it means that there are no divisors greater or equal to 2 and such that the quotient is greater or equal to 2, i.e. the number is prime.

There are two problems with this script. The first, admittedly minor one, is that it declares 1 to be prime, which only matters because I'm a pedantic maths student. The second is that it is slow.

On my machine it could deal with 5 digit numbers in a few seconds at most. One of the 6 digit numbers i tried took almost 2 minutes, but another took only a fraction of a second. The reason this wide range of run times occur is that the + quantifier is greedy: it will try an match the longest string possible. This in turn means that the script will try and find the greatest factor of the number. The end result is that it is relatively fast with numbers with lots of small factors, but the presence of even one big factor will drive up the run time. For example:

time perl -e 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' 243235

real    1m43.126s
user    1m33.220s
sys     0m0.910s

time perl -e 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' 294824

real    0m0.223s
user    0m0.080s
sys     0m0.010s
If you look at the prime decompositions of these numbers, you will find that 243235=5 * 48647 and 294824=23*137*269. As expected the number with the large factor is slower, by a factor of over a 1000.

Despite it being impractical in real world use, this does show that Perl regexes are more expressive than finite state machine: It is possible to match things that are not regular languages.(Thanks to pfft for pointing this out)

I do believe that I can one-up this. The easiest way to speed up primality testing is to only check factors up to the square root of the number in question. This is a little messy in Perl, because evaluating a square root inside a regex can only be done with an (?{}), or eval block, which are only semi-implemented and aren't even in most versions of Perl. So, we get around this:

perl -e 'print "PRIME" if (1 x (($A = int sqrt($B = shift))+$B-$A)) !~ /^(11{1,$A})\1+$/' xxx

This is ugly, but works like a dream. The quick explanation: Inside the most nested parentheses, $B is set equal to the number we're testing. The return value of that operation is, simply, $B; therefore, we can set $A equal to the closest integer (rounding down) to the square root of $B. We then take THAT result, add $B to it, and subtract it from itself, before feeding it to the x operator.

With me so far? If X is our number,
$A = int(sqrt(X))
$B = X (note: we no longer need B, but it was useful in setting up the x operator)

The regular expression then evaluates on the original number; but it only checks factors up to its square root. The results are the same, the speed is better. Comparative times to Brontosaurus's algorithm:

NOTE: For some reason, a segmentation fault resulted when using very large numbers. That may be due to precompiled limits killing Perl on my system; running it with -w fixes the problem.

SECOND NOTE: Brontosaurus' computer and mine are not the exact same speed; however, my results using his algorithm were comparable to his, within 20% or so. Close enough, considering the difference between them; my system is slower.

time perl -ew { bla bla bla } 243235

real	0m0.019s
user	0m0.010s
sys	0m0.010s

time perl -ew { bla bla bla } 294824

real	0m0.019s
user	0m0.010s
sys	0m0.010s

Infinitely faster. The delay due to the the + quantifier is killed by the upper limit; it's 493 for the first number, instead of 243235. This is important because Perl starts at the highest possible factor and moves toward the lowest. By eliminating about 242,000 potential factors, the script flies right along.

While both algorithms are still impractical, this one is impractical faster.

Log in or register to write something here or to contact authors.