Genetic Programming is process of evolving computer programs rather than manually coding them. Programs are expressed as virtual "genes" and, using a genetic algorithm, compete with each other in program space - i.e. the space of all possible programs that can be derived the from the gene representation. The most efficient, accurate program naturally rises to the top of this population of programs.

The idea is that in the future programs will be developed in this fashion. The science is still in its infancy, however, and programs produced in this fashion tend to have large quantities of redundant code.

What they said, except that I wouldn't argue so much with the code bloat — my own body's code has same. Rather, I'd be concerned about the need for better understanding of fitness functions. That is, at least in my own limited experience, the hard part.

But, from a conceptual point of view, it seems to me that GP has over simplified the entire equation in an effort to run before learning to walk. Now that is a hugely oblique and biased statement and based on no serious understanding of the issues, but look at it this way — GP is imitating non-determinanistic systems to achieve specific practical results. Nothing wrong with that, but it would be neat to hear about work that more closely adheres to the model.

Basically, I'd love to see a real code-eat-code world...

What is it?

Answer this question: What do you get if you cross genetic algorithms with programming? That's genetic programming. Why go to all that bother to write a program to do something for you, when you can simply evolve it? Well, probably because it'll take millenia to get anywhere near good, and then it'll be full of crap that doesn't do anything. For some problems, though, genetic programming is well placed to come up with solutions to complex problems with very little in the way of human input.

What does a genetic program look like?

Not like C or perl, that's for sure. Genetic programs tend to be represented as binary trees. If you want to represent 3+2 as a tree, you'd do it like this:

    +
   / \
  3   2 

The leaf nodes, (like 2) just return their values upwards. The function nodes (like +) run some function on their child nodes and return that upwards. In this example, + has the children 2 and 3. It adds them and returns the result further up the tree (except there isn't any more tree in this case).

To solve a problem using genetic programming, you simply make a load of randomly generated programs and then mutate and breed them depending on how good they are. Just like real-world evolution.

I don't get it. Give me an example

For this example, we're going to evolve a program which will model the quadratic x2+x+1 using the function nodes +, -, * (for times) and % (for divide). As leaf nodes we'll have the numbers 1-10, and, of course, our input: x

A selection of random programs is generated. In this case we will generate four, but in the real world many hundreds or thousands would probably be used. There will also be a specified maximum depth. In this case we have a max depth of 3 but, again, with a real-world problem a much larger tree would probably be used.

So, let's see our random programs. I've shown them here as trees, then translated that into usual maths notation, and then shown that in a simplified form.

   (a)        (b)         (c)        (d)

    -          +           +          *
   / \        / \         / \        / \
  +   0      1   *       2   0      x   -
 / \            / \                    / \
x   1          x   x                 -1  -2

 x+1-0        1+x*x       2+0      x*(-1--2)

 x+1          1+x2         2           x

These would then be measured for fitness. How fitness is measured depends on the individual application. In this case it would probably involve summing the differences between the results produced and the ideal results. It's very unlikely that a random program would be anywhere close, but in this case, program (a) is not a long way off over the range -1 < x < 1.

Programs will then be selected to survive into the next generation. The selection will be partly random, but with a higher probability of selecting a program with a higher fitness. Tournament selection and fitness-proportionate selection are two common methods.

Once the surviving programs are selected, there are three methods of copying them on. I will use our first generation examples to explain each one.

Reproduction

This is the simple one. Generally only used for high-fitness programs, reproduction simply copies the program, completely unchanged. We will do this with 1(a).

   (2a)

    -
   / \
  +   0
 / \   
x   1  

 x+1-0

 x+1  

Mutation

In this method, one of the nodes is picked at random (although, with large trees often only nodes below a certain depth are picked) and randomly changed. In our example, we pick program (c), randomly pick the leafnode containing just the number 2, and randomly generate a new sub-tree for that position. In our eample we rplace it with an x.

   (2b)

    +
   / \
  x   0 

 x+0

 x  

Crossover

This is the most interesting operation. In our example, we pick the (a) and (b) for sexual reproduction. Note that we've now picked (a) twice, and have never picked (d). This is because (a) was the fittest (d) was rubbish. This sort of selection will often happen and is what speads good bits of programs and eliminates bad ones.

One point of the first parent (1a), namely the + function, is randomly picked as the crossover point for the first parent. One point of the second parent (1b), namely its leftmost terminal x, is randomly picked as the crossover point for the second parent. The crossover operation is then performed on the two parents producing two offspring.

   (2c)        (2d)

    +          +   
   / \        / \    
  %   0      1   *   
 / \            / \  
x   x          +   x 
              / \
             x   1

 0+x/x        1+(x+1)*x  

 1            x2+x+1

When measuring the fitness of these again, we find that program (2d) is perfect. Of course, in the real world, we would go through hundreds, thousands or even millions of iterations before finding such a good result.


Sources:
Some of this from my head and the rest from my nature-inspired computing lecture at Exeter University.

Log in or registerto write something here or to contact authors.