Genetic Programming 101 (Ruby)

I took a machine learning course this quarter and had the chance to do some genetic programming. Given Ruby’s readability, it struck me as a bit odd that there weren’t more genetic programming tutorials written using it. I had a blast playing with the programs, so I set out to write a short post highlighting the basic concepts. I hope to do another in the future using Haskell as well.

There are really only 4 simple steps to genetic programming. You can choose to make them as simple or as complex as you like, but they are as follows:

  1. create population of random (or known) solutions
  2. calculate the fitness of your solutions
  3. breed top performing solutions
  4. goto 2.

Genetic programming sounds very intimidating so I wanted this to be as simple as possible. For this post I wrote a short program that would evolve an 32 member array of 1’s. Sounds silly, but I think it’s quite expressive.

The Code

First we need to be able to generate random 32 member arrays for our seed population.


def random_chromosome
  Array.new(32).collect {|e| rand 2}
end

Next we need the ability to calculate the fitness of a chromosome. I chose to use the frequency of 1’s as the fitness.

NOTE: I should state explicitly that I use “member” and “chromosome” interchangeably – they are both 32 member arrays. I admit that at times my choice was rather arbitrary, but it made more sense grammatically to pass mutate a “chromo” and pass calc_fitness a “member”.


def calc_fitness(member)
  member.collect {|e| e if e == 1}.compact.size
end

Next up? Mating. This function takes as parameters two chromosomes/members. Again, I went with simple, so it concatenates the first half of “dad” with the second half of “mom”. Thrown in there is a mutate method that randomly flips two bits in the chromosome. This isn’t really required, but helps illustrate how to do so if you ever need to and adds an element of randomness.


def mate(chromo1, chromo2)
  mutate(chromo1.first(16) + chromo2.last(16))
end

def mutate(chromo)
  2.times { chromo[rand 32] = rand 2}
  chromo
end

Lastly, we need to collect the top 10 fittest chromosomes – i’m calling them alphas – so they can create the next generation. Forgive me ahead of time as this function is a bit awkward.


def alphas(arrs)
  tmp = []
  arrs.each_index {|i| tmp 

Before we move on, I opened up the array class (ruby awesomeness!) to add several functions for convenience. Like the mutate function, these aren’t necessary, but as you’ll see in the main loop, it makes things easier to read.


class Array
  def random
    self[rand self.size]
  end
 
  def is_spartan?
    self == (Array.new(32).fill 1)
  end
end

Alright, so we’ve got everything we need. This could be cleaned up a bit more, but the main loop is as follows…


require 'evo_helpers'
include EvoHelpers
 
start = Time.now
NUM_OFFSPRING = ARGV[0].to_i
SPARTAN = Array.new(32).fill 1
SEEDS = Array.new(NUM_OFFSPRING).collect {random_chromosome}
top10 = alphas(SEEDS)
 
i = 0
loop do
  case top10.include? SPARTAN
  when true
    winners = top10.collect {|a| a if a.is_spartan?}.compact
    puts "#{winners.size} member(s) reached spartan status!"
    puts "completed in #{Time.now - start} seconds"
    break
  else
    puts "*** ALPHAS FROM GENERATION: #{i} ***"
    offspring = []
    NUM_OFFSPRING.times do |j|
      offspring 

So, define the thing we’re striving towards – a 32 member array of 1’s, I call a spartan. Generate a bunch of random chromosomes as defined by a single command line argument. And start looping. We check the current batch to see if any are spartans, if so we print some stats and break, if not, we calculate the fitness of them, randomly mate them, and loop again!

You can run this code with the following command: “ruby evolve.rb num”, where num is the number of offspring generated at each generation. Play around with these values. Since we chose the 10 fittest at each generation, 10 offspring results in a long running experiment. I mashed ctrl-c at 4k+ generations. Anything greater than 20 offspring and the program completes after only 8-10 generations.

Here’s a snippet of the output using 15 offspring.


*** ALPHAS FROM GENERATION: 110 ***
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 31
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1], fitness: 30
[1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 30
[1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1], fitness: 30
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1], fitness: 29
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], fitness: 29
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0], fitness: 29
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 29
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], fitness: 28
[1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1], fitness: 28

*** ALPHAS FROM GENERATION: 111 ***
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1], fitness: 31
[1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 31
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 31
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1], fitness: 29
[1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0], fitness: 28
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1], fitness: 28
[1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1], fitness: 28
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1], fitness: 28
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1], fitness: 28
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], fitness: 28

*** ALPHAS FROM GENERATION: 112 ***
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 32
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1], fitness: 31
[1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], fitness: 31
[1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1], fitness: 30
[1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1], fitness: 29
[1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1], fitness: 29
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1], fitness: 28
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1], fitness: 28
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], fitness: 27
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1], fitness: 27

1 member(s) reached spartan status!
completed in 2.197729 seconds

Reality Check

Of course this isn’t all that realistic. Very rarely do you know the solution you need look for (like we did with the spartan) before you start. But you can test your solution on the problem and calculate the error. For instance, Boeing has used genetic programming to find optimal blade angles for it’s turbine engines. I’m not totally familiar with their methods, but it seems reasonable that they would generate a bunch of solutions, test them in a simulator and mate solutions that created the most thrust/lowest fuel consumption/etc. And instead of breaking on a specific solution, you might define some epsilon constant (insignificant difference) and test that the difference between your top 10 solutions is greater than the epsilon. That is, test the current batch and confirm that the program is still making forward progress (evolving ever-fitter solutions).

Conclusion

Hope you enjoyed the short lesson. Believe it or not, genetic programming doesn’t stray too far from these basic concepts. Representations of problems/solutions will differ, fitness functions will grow more complex, testing methods will change, but the concepts stay the same.

github repo