For my university thesis I explored the possibilites that neuroevolution algorithms can give. I started by understanding the genetic algorithms and then continued to the artificial neural networks. Combining them wasn’t the easiest task but some good books and papers helped me a lot.
Evolutionary techniques are very important in the modern life and they are very widely used. These algorithms are very powerful in real world application problems, they can control and evolve physical devices. The evolution is on the phenotype of the device. Given some constraints they will produce a very close to optimum design for the given task. On the other hand control of devices includes evolution of controllers that are capable of driving mobile robots, automobiles and even rockets.
Furthermore evolutionary algorithms are very good to simulate artificial life or any other kind of simulation. They can simulate from simple hunting and mating behaviors to more complex ones, such as the investigation of what conditions are necessary for a certain behavior to evolve. There has been done a lot of work in this domain and many experiments has shown how behaviors like foraging, pursuit and evasion, hunting and herding, collaboration and even communication may emerge in response to environmental changes that pressures the individuals.
Last but not least evolutionary algorithms has a vast domain of application in games. Other than the most obvious application of adaptable and evolutionary enemy behaviours that enhances the experience of the player, they can be also used as a game tester while the game is still on the development. This tester can evolve a plan of play to beat the game with the best available way. The result of such a plan will so if something in the game is completely out of balance since the evolutionary algorithm will exploit it. Moreover neuroevolution has been used to evolve the player’s weapons. This way content is created by the player and not by the developer and the player usually likes the uniqueness of his intelligent weapon.
My first aproach
Genetic algorithm for movement
The thing that the first genetic algorithm is trying to achieve is to move some agents form a starting point to the finish line. The agents are learning to travel a small distance over some time. It is a very simple but fundamental approach to get the basic feeling of a genetic algorithm.
The genome is represented by a list of smaller nodes that contains the acceleration of the agent for a specific amount of time. The time is measured in frames. The given acceleration is in [-8,8] units of force.
Due to the physics engine it must be noted that each of the agents has a mass. Thus any force in the space between [-4,4] it will result to no movement at all. Forces in the space (4,8] will result to an acceleration of the body of the agent to the right. On the other hand forces in [-8,-4) will accelerate the body to the left of the screen.
There is a specific amount of genes for each agent that accelerate the agent’s body for a specific number of frames.
To mutate a gene of the agent’s genome the acceleration value of the gene is replaced with a new random value between [-8,8].
Crossover and Parent Selection
Signle point crossover as discribed previously is used in this simple problem.The selection is done with the implementation of a simple roulette wheel selection algorithm.
The fitness of each agent is calculated using three different positions, the starting position (SP), the current position (CP) that the agent stopped moving and the desired position (DP) that the agent must reach. The fitness can be easily described with 3 different equations. The values 10000 and 100 are constants that help the algorithm converge faster
After some testing the standard conversion is between the generations five to twelve using a mutation rate of 5%. The model will converge even with very small populations.
Even though the bots learn to travel the given distance within the given time due to the nature of the problem and the fitness function model there are two behaviours that are not desired. The first one is that sometimes the agent goes out of the borders , before the starting point or after the desired point, this might mean that the agent might not be rendered for a small period of time. The second one is that sometimes might reach the desired position in a shorter period of time and stay there without moving for a significant period of time.
My second aproach
Improving and enhancing the simulation
The second approach to the problem of traveling from one point to an other includes two things. First there is a need to fix the problems that have emerged from the first approach. And secondly to add agents that can jump or even fly. For the first problem both the genome and the fitness functions must be changed. For the second there must be added more functionality to the genes of each agent.
The genome is represented as a list of nodes that now consists from two parts in the simple moving agent, walkers, and three on the ones that jump or fly. The first part is still the vertical acceleration. For the walkers is still the same constraints as explained on the first approach. The jumpers have a random number between [-1.75,1,75] and the flyers [-1,1]. Note that due to the fact that there is no friction while they are on the air there is no dead space that they can not accelerate.
The second parameter of the gene is the time that the acceleration is applied. This will help agents evolve to reach their target in a specific amount of time. For the walkers the time constraints are [1,30] , for the jumpers [50,120] and for the fliers [6,9]. Time is measured in frames.
The third and final parameter of the gene is the vertical force that will be applied on the body to make it take off the ground. This force is applied once in the first frame of the gene. For the jumpers the minimum and the maximum vertical force is between [0,450] and for the fliers [65,70].
Mutation might be happening separately on each of the parameters of each gene. This means that in most cases only one of the parameters will change but there is a possibility that the others will also change. This change is by choosing a random number within the constraints of the parameter given in the genome representation section. The mutation rate for the walker is 5% , for the jumper is 2% and for the flier is 5%.
Crossover and Parent Selection
Single point crossover is used and the selection is done with the implementation of a simple roulette wheel selection algorithm.
The Fitness function can be divided into three separate ones for the walker and the jumper and in four for the flier. The first part of the function is quite similar to the initial fitness function that was used on the first approach for all three different agents. The adjustments are made so there are better results.(equations 1, 2 & 3)
The second part is the one that helps the agents learn to stay between the borders. This is simple done by awarding the agent for each frame that the agent is after the starting point and before the desired destination point. (equation 4)
The third part is for the time of arrival. This fitness is rewarded only the first time the agent crosses the desired destination point. (equation 5)
Where Fs stands for Frames per second and it is 50, DT is the desired time of arrival to the desired destination and t is the actual time of arrival.
Lastly the fliers has a fourth part that alters the fitness and it deals with the height constraints. It works as the second part of the equation but for the height instead of the horizontal distance. (equation 6)
For the walkers the generation that they start to converge to find a solution is shifted between the fifteenth and twentieth generation. It starts to arrive with small deviation at the desired point at the desired time of arrival usually after the thirtieth generation.
The jumpers usually perform very well. They usually arrive at the desired destination within the first ten to twelve generations but they usually need more to find the sweet spot of timed arrival but they do after the thirty fifth generation.
As for the fliers they don’t perform as well as the others. Even though they try to float around the air it is very difficult for them to stay within the height limits. Other than that they usually find their way to the desired position with one way or the other at around the tenth generation, but they arrive with small time deviation at about the thirty fifth generatio. Lasty they find their way floating around in desired height at generation forty five or so.
Brain out of Neurons
The third implementation is all about neuroevolution. The algorithm will evolve neural networks with fixed topologies to create unique behaviours. The player that was introduces in the previous paragraph will be the cornerstone of this evolution. Each species will have different network topologies to try to solve their problem as well as they can.
The genome in this implementation is rather simple. A list of all the weights of the artificial neural network is everything that it takes. However a structure in this list has been made so it can be handled properly. The neural network consists of a list of neuron layers. A neuron layer embody a list of neurons and each neuron a list of weights that includes its bias.
All the neural networks can be examined in three stages. The input layer, the hidden layers and the output layer. The input layer is the same for all the different species of the simulation’s agents. This are the eight points in their two dimensional world space as described before.
The walker has two layers of neurons in his hidden layer. The first has ten neurons and the second five. The jumper has also two layers of neurons but they have twelve and eight. After a lot of testing the flier ended up with three layers of neurons consisting of eighteen twelve and six neurons. This topologies have changed a lot during the simulations.
Lastly the output layer of the network represents the type of movement the agent must make. Due to the fact that sigmoid function of the sum of each neuron output the output values vary between [0,1]. Thus the movement of the walker must be divided into two separate movements, going forth or going backwards, this means that two neurons are necessary for the output layer of the walker. The jumper and the flier needs a third one that signals the vertical force. Again due to the fact that the output is between [0,1] adjustments must be made so each agent will get the right amount of force.
The walker has the output of his neural network multiplied by eight and minus eight. After multiplication the sum of the two outputs is the horizontal acceleration that is applied to the agent’s body. For the jumper the horizontal coefficients are five and minus five and the horizontal acceleration is calculated with the same way, as for vertical coefficient it is four hundred and twenty five. For the flier the horizontal coefficients are one and a half and minus one and a half and the vertical is twenty.
Each species has each own population pool. This pool is gradually filling up with random agents that represent the initial population. An agent must have its fitness calculated to enter this pool, in other words it has to die. Once its maximum capacity is reached and new agents must be created the selection, the crossover and the mutation takes place as usual but they create only one new agent. This new agent is sent out to try out his strategy. Once he is dead and the fitness is calculated he must compare it with the worst individual of the pool. If he performed better than the worst fit is taken off the pool and the new agent becomes a part of it.
For the mutation every single weight of the genome must be tested and changed if necessary. The mutation rate for all three species is rather high standing at twenty percent. If a weight is mutated then it takes a new value between [-1,1].
Crossover and Parent Selection
Single point crossover is being used for this implementation also but with a small twist. The point that is selected for the crossover to take place can’t be in the middle of the weights of a neuron. Only whole neurons are transferred to the new agent. The structure of the genome help us to divide the genome at the right point. As for the selection classical roulette wheel is being used.
The fitness functions are quite similar to those of the second implementation. The goals of each agent has not be changed
For the walkers mainly two behaviors appear. The first one is to wait as far as they can and when they see the player jumping to run as fast as they can to cross the finish line. The second one is that sometimes they run towards the player no matter if he is shooting towards them. The first behavior is quite expected given that they realize the bullets and the player as a threat. They usually evolve fast enough to challenge the player.
For the jumpers the results was amazing. They converge to their behaviour extremely fast. In most cases they synchronize their jumps to the bullet fire rate and they jump between them. It has been noticed that they even learn to jump backwards if the player jumps so they can avoid the next incoming bullet. This general behaviour together with the walkers makes it really hard for the player not to lose after a few minutes of gameplay.
Sadly the fliers are not performing that well. They can’t really find a way to float around the air of their world. In rare cases that they kinda succeed they have strange behaviours such as trying to pass over the bullets which makes them fly out of the screen. But this behaviour needs a lot of time to be developed and there is no challenge in them for the player. The main issue of the fliers is that the can’t come up with a plan to bit gravity.
Get a digital copy of my Thesis.
Check the whole list of videos.