Primality testing with Perl regexs

A regex that can be used in perl to test for primality. The full code is

perl -e 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' xxxx

where xxxx is the number who's primality you wish to test. This one liner first appeared on comp.lang.perl.misc, posted by Abigail.

Now if you're not a perl monk (and I'm certainly not), you're probably quite confused as to how on earth this works. Seeing a few perl one-liners has probably convinced some people that perl scripts are written by monkeys running along people's keyboard. This particular one isn't that bad however.

The first part, perl -e is just the incantation necessary to get perl to run a script passed to it on the command line. The next part is equally simple, it says print 'PRIME' if a certain condition is met. The magic starts here.

In perl, shift takes the first value off an array and returns it. If shift is not given an argument, it operates on the array @_, which is the array of arguments passed to the current function, so in this case, shift will return the parameter we passed, i.e. the number who's primality we wish to test.

The binary x operator, returns the string obtained by taking its left argument and repeating it the number of times specified by the right argument. So (1 x shift) evaluates to a string of ones, of length the number passed as an argument.

!~ /^(11+)\1+$/ is the heart of the script. !~ means that the result of this is true if what is on the left (our long string of ones) does not match the regex that follows. Lets break down the regex.

/^(11+)\1+$/

The 2 slashes are just there to mark the beginning and end of the regular expression. ^ and $ have special meanings, respectively the start and the end of the string being matched. This means for a string to match this pattern, the entire string must match the pattern (11+)\1+.

The pattern 1 is just the character 1. The + means "one or more times". So 11+ is 2 or more 1's

The brackets around the 11+ create a capture buffer. \1 refers to the 1st capture buffer, i.e. (11+) (it's known as a backreference). And lastly, the + means that we want \1 to be repeated one or more times.

Why this works should now become apparent. If a string matches this pattern, it means that we can take a string of 2 or more characters, repeat it 2 or more times and get the whole string. What we have done is found a divisor of the length of the string, i.e. a divisor of the number passed to the perl script. And so if the string does not match this pattern, it means that there are no divisors greater or equal to 2 and such that the quotient is greater or equal to 2, i.e. the number is prime.

There are two problems with this script. The first, admittedly minor one, is that it declares 1 to be prime, which only matters because I'm a pedantic maths student. The second is that it is slow.

On my machine it could deal with 5 digit numbers in a few seconds at most. One of the 6 digit numbers i tried took almost 2 minutes, but another took only a fraction of a second. The reason this wide range of run times occur is that the + quantifier is greedy: it will try an match the longest string possible. This in turn means that the script will try and find the greatest factor of the number. The end result is that it is relatively fast with numbers with lots of small factors, but the presence of even one big factor will drive up the run time. For example:

time perl -e 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' 243235

real    1m43.126s
user    1m33.220s
sys     0m0.910s

time perl -e 'print "PRIME" if (1 x shift) !~ /^(11+)\1+$/' 294824

real    0m0.223s
user    0m0.080s
sys     0m0.010s

If you look at the prime decompositions of these numbers, you will find that 243235=5 * 48647 and 294824=2³*137*269. As expected the number with the large factor is slower, by a factor of over a 1000.

Despite it being impractical in real world use, this does show that Perl regexes are more expressive than finite state machine: It is possible to match things that are not regular languages.(Thanks to pfft for pointing this out)

"1^p" p prime language	The Teach Yourself to be a Dummy in 24 Hours Bible	Finite state machine	Primality Testing
regular language	regular expression	How many primes are there?	Perl monk
@_	Dem Bones	"a^n b^n" language	Prime number language
Mastering Regular Expressions	regex	You're WRONG and you're a GROTESQUELY UGLY FREAK	Prime decomposition
pigeon hole	command line	SAT	The House of Merfyn Frych
Apostrophe Protection Society	Is it less painful to drown in salt water or fresh water?	Education is evil. Knowledge is evil. Be a moron. Forget how to spell.	quantifier