The genetic algorithm is a method for making intelligent guesses (scientific term, optimization) at the answer to a problem which really can't be solved. Created at the University of Michigan by John Holland, this approach uses computers to simulate evolution through natural selection. In essence, a correct (or close) answer to the problem gets "evolved."

Genetic Algorithms, or GAs implement efficient mechanisms to search what in mathematical terms are called a solution space. This family of algorithms adopt several terms from biology, and hence the use of the term Genetic. They are quite easy to understand and explain.

A typical GA will employ several distinct modules to achieve it's goal of searching a large - and possibly highly disparate - solution space. They are presented in roughly the sequential order of execution during a GA simulation.

An evaluator. The purpose of this function is to consider all existing solutions, to determine if an optimal solution has been found within a precision programmed by the GA creator. It also will determine if the function has failed to converge, and abort further processing if necessary.

If this is the case, a GA will present the best solution found to date in the simulation. Sometimes referred to in the literature as the God function.

The Grim Reaper: this function is responsbile for examining the existing solutions, and removing those which manifest the lowest score. The biological equivalent of survival of the fittest.

A Breeder. This function takes a subset of the existing solutions, and intermingles portions of their values into new solutions. This is probably best explained with an example.

Consider we have a function of three variables, f(x,y,z). We also have two candidate solutions, f(1,2,9) and f(-3,6,3). The breeder would construct a new candiate solution, for example, f(1,6,9), employing the DNA (continuing the use biological metaphors) of each. This clearly a new solution that is a descendant of the two parent solutions. Keep in mind that both parent solutions were considered optimal, and therefore any child solutions should be at least as good, and possbily better, than them (another biological metaphor).

The real strength of GA's lies in the next function: the mutant or mutation function. This module takes a small subset of existing solutions, and randomly changes their values (once again, DNA to use the biological metaphor)). The purpose of the mutation function is to insure that the GA does NOT get locked in a local region of the solution space.

In other words, the mutation function insures that a degree of randomness is introduced into the searching - and hence the possible solutions - evaluated by the GA.

Processing now continues at the first processing step, the evaluator.
A genetic algorithm is created by running an intelligent trial-and-error system which rewards efficiency. You take a model of the problem, create random algorithms, then try each of them. The best few algorithms are then slightly modified by either random mutation or by combining to create "child" algorithms. This process is repeated for a while; eventually you should get a highly efficient algorithm (if you wait long enough). These algorithms may find logical quirks that we may not think of, or find strange "hot spots" in efficiency that are part of a system that is too chaotic for us to find ourselves.

A good example of a problem which I think could be solved by genetic algorithms is the elevator algorithm.

Genetic algorithms are a fantastic way to quickly find a near-optimal solution to a given problem. Rather than expand on the excellent descriptive writeups above, I thought I would give a simple example of a genetic algorithm at work.

Let's say we have a pool of random letters and "space" characters, which we can string together into series of length 44. We are attempting to assemble the following sentence out of this primordial soup of letters and spaces:

```the quick brown dog jumped over the lazy fox
```

All we know about this sentence is that it is 44 characters in length, is composed up of the letters a-z and spaces, and that we have a function called fitness(string), which accepts our test strings and returns a fitness value; in other words, merely tells us how many of the characters are correct.

We then make a number of random strings of letters of length 44. For this example, we'll just make two of them; in a real environment, we would make hundreds or thousands of them. Here are the two random strings, matched up against the actual sequence, as well as their fitness values:

```tzag uinapd oxnavng asrqmdalv evxaempzno yce
|    ||     | |   |      |  |     |     |     fitness = 10
the quick brown dog jumped over the lazy fox

hgfboaiey afdoanvsdar ayrebnonas od ahdsflga
|  |                         |          fitness = 3
the quick brown dog jumped over the lazy fox
```

We can clearly see that the first one has the best fitness, so we toss out the other string and focus on the first one. Real algorithms would save several of the best ones. For the next round, we make a number of mutations to the saved string, meaning we choose a number of characters and substitute in, completely randomly, another letter. If we had several to choose from, we might also take a prefix of one of the strings and combine it with a suffix of another string to result in a third string of the same length; this is called crossing over and is another way to generate good matches.

Let's make a couple of mutants...

```original: tzag uinapd oxnavng asrqmdalv evxaempzno yce
|    |   |   |     |    |    |    |   |  |
mutant: tzeg uioapdxoxnzvng amrqmdxlv ekxaemlznolycx

original: tzag uinapd oxnavng asrqmdalv evxaempzno yce
|    |   |   |     |    |    |    |   |  |
mutant: thag ubnape oxxavng lsrqmpalv zvxaekpzne yve
```

... and then see how fit these mutants are compared to the best of the previous batch!

```tzag uinapd oxnavng asrqmdalv evxaempzno yce
|    ||     | |   |      |  |     |     |     fitness = 10
the quick brown dog jumped over the lazy fox

tzeg uioapdxoxnzvng amrqmdxlv ekxaemlznolycx
| |  ||     | |   |      |  |       |      |  fitness = 11
the quick brown dog jumped over the lazy fox

thag ubnape oxxavng lsrqmpalv zvxaekpzne yve
||   |      |     ||        |     |     |     fitness = 9
the quick brown dog jumped over the lazy fox
```

The first mutant is better than the original, so we base the next round of mutations on this one. Eventually, it improves and gradually becomes an exact match for the given string.

Given the simplicity of the algorithm and example here, it's easy to imagine a better way of solving this particular problem, but for extremely complex situations where the string is merely a model of something else, you generate and conserve a larger number of mutants, and you are using additional techniques such as crossing over, it's an effective way of solving the problem.

It should be noted that Genetic Algorithms (GAs) are not used for finding the optimal solution (if such a thing exists in your problem), but rather a solution that is good enough. Also, a GA is almost always beaten by problem-specific algorithms; if you know something about your problem that you can exploit, you are very often better off constructing an application-specific way of solving it.

So why use a GA? Because GAs are versatile and simple. Simple enough for a GA to be tried in quite short time. Versatile enough to be applied to nearly any problem, as long as you can quantify how good a suggested solution to the problem is. Therefore, GAs excel when you have a complex problem you don't know how to solve, but you know how a good solution would look. NP complete problems are typically those kind of problems, and indeed a field where GAs are very useful.

A research program recently used a genetic algorithm to design a swimming robot. A computer simulation was used to test the possible designs; the robot that could swim a mile in the fastest time won. All sorts of designs evolved; flappy-arm robots, propellor-driven ones, and so on. Some were quite fast.

The robot that won, however, was amazingly quick; it was a pillar a mile tall that fell over, thus crossing the finishing line in a matter of seconds.

Unsurprisingly, the people running the experiment didn't publish their results. They did, however, spread by word of mouth among the AI community.

# Everything you need to code a genetic algorithm

A genetic algorithm aims to solve a problem by using a process analogous to biological evolution, but it is important to note that it is not the same. Also, be careful not to confuse genetic algorithms with genetic programming.

The key difference between genetic algorithms and genetic programming is that genetic programming involves actual random generation of programs to solve a problem, and then breeding the ones that are best at solving the problem. With a genetic algorithm, rather than randomly generating a program, instances of a genetic mapping are randomly generated and bred.

A genetic mapping is relationship, determined by the programmer, between a gene sequence, usually a string of 1s and 0s, and a phenotype, some sort of process for solving the problem determined by the programmer. For example, say we are trying to use a genetic algorithm to learn how to navigate through a specific maze, given an entrance point and an exit point.

The "genes" making up our agent will be a long string of numbers, 0-4, each representing the direction the agent should take given a certain combination of walls around it (and 0 meaning stand still). The first number in the sequence will say what the agent should do if it is surrounded by walls, the next will say what it should do if it is surrounded by walls except an opening to the east, the next representing if there is an opening to the north, etc. until the agent has a reaction for every possible combination of walls.

If we represent each wall as a 1 or a 0, and we have four directions, that means we have 24-1=15 genes for our agent, each representing the direction it will try to go for a given combination of walls. First, we'll randomly generate a population of agents, say 100. This means we'll randomly generate 100 strings, 15 characters long, of the numbers 0-4. Our first individuals will probably look something like this:

```043214201214303
444112340123042
...
```

Now, we need a fitness function, a way of rating each agent on how well it performs. Again, there is no formal process for doing this and it is largely up to the programmer and dependent on the specific problem you are trying to solve. First, we want rate higher agents that make it through the maze. Second, we should also award more points to the agents that make it through the maze in the quickest amount of time. Our fitness function would look something like this, where M is the number of moves the agent made and S is a 1 or a 0 indicating if the agent succeeded in going through the maze:

```F(M, S) = 1/M * S
```

However, we have one signficant problem. What do we do if no agents in our initial population make it through the maze at all, so they all score 0? We could wait for mutation to result in some agents that make it through the maze, but this will likely mean burning a lot of cycles waiting to randomly generate a solution. We could change our function so that agents are given higher fitness based on how close to the finish they got, but this could result in giving high fitness to agents that are close to the finish, but not on the right path to reaching it.

This problem emphasizes why choosing a good fitness function is extremely important if your genetic algorithm is ever going to solve the problem. A better solution would be to reward the agent for the number of unique cells in the maze they visit. This way the agent gets higher fitness for exploring, and agents that just traverse the same area of the maze over and over will quickly die off. New fitness function, given U, the number of unique cells visited:

```F(M, U, S) = 1/M * S + (1-1/U)
```

Notice that rather than simply multiplying by U, we add (1-1/U). Why? First, we don't want just multiply by U, because then agents that didn't make it through the maze but did a lot of exploring would still get a fitness score of 0. Second, we don't want to simply add U, because this could result in agents that explored but didn't succeed having a higher fitness score than agents who actually made it through the maze.

Now we can run all of our agents through the maze and score them. Then we have to figure out how to handle natural selection -- how do we determine which individuals are fit to breed? The most obvious solution is to just choose the top X individuals, say the better half of the population (as it turns out, if you want to have each generation have the same population, you should breed half of the individuals of the population -- why is explained further on). It turns out this is a really bad idea.

Why? When coding a GA you're goal is to reach convergence -- a point at which the population has become mostly stable, and is mostly homogenous, in which most agents can solve the problem. If you only select X top individuals, you will encounter premature convergence -- the population will become homogenous very quickly, probably just copies of the first agent to get a decent fitness score, often not the actual best solution the GA can come up with.

A better solution (and a more realistic one) is to use Roullete Selection in which each agent has a probability of breeding. How probable depends on the agents fitness score. This way the most succesfull agents still breed, but there is still a chance for the lower end of the population. This works better because it avoids prematurely giving the first working solution total control, so that instead the most optimal solution can be found. Additionally, one may employ Rank Selection if there is a large disparity between fitness scores.

So, with Roullete Selection we figure we're in the clear and ready to get breeding. Wrong. We most likely will still fail. It has been mathematically proven that when working with binary genotypes (granted, our genotype is not binary, containing 2s, 3s, 4s, but empirically it has been shown to be true in general) convergence will never happen unless we also employ Elitism into our selection process1.

The problem with straight Roullete selection is that since it works on random chance, there is always the possibility we'll throw away the most fit members of the population. Elitism simply gives a guaranteed seat in the next population to the top X individuals, usually a small number proportional to the population, say 4 in a population of 100.

OK, now we can begin breeding. The most common form of breeding used in GAs is genetic crossover. This method has the peculiar property that it always generates more than 1 child. To keep things simple, we will not have omnidirectional sex, and only let children be the result of 2 parents. Once we've used the selection process to determine which individuals we want to breed, we choose a random number, in our case 1-15, to choose as our cutoff point. This is where we will "chop" the gene sequence for each individual, like so:

```Random cut off = 4
A. 0432 B. 14201214303
C. 4441 D. 12340123042
```

We have now chopped the original 15 character strings each into 2 smaller strings of size 4 and 11. To perform crossover, we combine A and D to make our first new child, and combine C and B to make the second. Tada! We now have offspring.

But the children aren't fully formed yet. If we just used the children we got from crossover, we would once again run into the nasty problem of premature convergence. In order for us to find an optimal solution using just the children resulting from crossover, we're making the assumption that our initial population was big enough and varied enough for it to be possible for crossover to find the optimal solution. However, since the initial population is generated randomly, this is not a given.

To overcome this problem, we introduce another biological analogy, mutation. Mutation can be performed in several ways. The most simple, when dealing with binary genes, is to simply pick a random bit and flip it. If you're dealing with float values for genes, then you may use bias mutation, in which you add a random number in a small range to the gene. In our case, we're dealing with just integer values, so we need to come up with our own method of mutation. We'll simply pick one random character in the string and assign it to be a new random number from 0-4.

Not all children should be mutated -- mutation hinders more often than it helps an offspring. Usually the programmer will pick a constant chance to use for the offspring being mutated, say 10%. This way mutation still occurs, but not so often that it destroys good solutions.

Now we simply repeat the process over and over until we observe that the fitness function has converged. Typically the programmer defines a value for the fitness function that is considered satisfactory and the program quits when this has been reached, or adds a line of code to print out the fitness of the best individual and stops the program by hand once this value has peaked.

# Mathematical foundations, and how to improve your code

GAs are typically looked down upon in the machine learning community because unlike other methods such as support vectory machines and hidden markov models, GAs don't have a rigorous mathematical basis. Convergence is not guaranteed. Many academics look at GAs as, "what you try when you haven't studied the problem enough." Mathematically proving when GAs will or will not work is difficult because a lot of the work is arbitrary -- there is no set in stone algorithm for choosing the mapping for a specific problem, and there are several different types of selection, breeding, and mutation.

That said, there have been some attempts to give GAs a mathematical foundation with Schema Theory. From my lecture notes, "A schema is simply a string similar to our gene sequence, like, ***10**. It has been proven that schema's with short fixed regions tend to increase exponentially, if average instances of those schema have above average fitness."2

Um, okay, you say, but what does that mean for me? One implication that this has is that you should construct a mapping that maximizes the adjacency property. This means that your GA will perform better if you have genes that are related next to each other. This is not really applicable to the maze example, but in a more complicated environment a sequence of genes may represent several different behaviors. In this case it is better to have genes controlling a specific behavior next to each other. Your chances of reaching convergence and an optimal solution increase.

# Where to go from here

Genetic algorithms can be mixed with other forms of machine learning. For example, the genes could represent the weights in a neural net. Or the number of hidden states in a series of hidden markov models. Coming up with good mappings is considered somewhat of a black art, so experiment!

There is also a wealth of literature on genetic algorithms and genetic programming. A quick trip the library or a google search can have you drowning in useful information. Also, Machine Learning journals frequently have examples of researchers who are applying GAs to different problems.

If you're mathematically gifted, take a crack at expanding on Schema Theory or coming up with a different justification entirely. The more the situations in which genetic algorithms work and don't work are firmly understood, the less time programmers will waste and the more problems will be solved.

1http://www.quantlet.com/mdstat/scripts/csa/html/node53.html
2Lecture Notes for my Machine Learning Course at Kalamazoo College, taught by Nathan Sprague. (Also used throughout this article).
Other Sources: The textbook Aritificial Intelligence: A Modern Approach by Russel and Norvig.
Website: http://aima.cs.berkeley.edu/

Log in or register to write something here or to contact authors.