CS 278: Evolution of Neurally Controlled Robots.
GENDALC setup Documentation
GENDALC stands for Genetic Evolution of Neuron Distribution and Layer Connectivity.
-1. A quick description of GENDALC can be found on this paper.
0. Getting the source code.
The source code is in my home directory on hex, /home/jdavila . ssh into hex, and make a copy of it into your home directory. Then 'execute tar -xvf GENDALC.tar' . That will unpack all of the necessary files.
1. General Genitor setup.
GENDALC is a modification based on the Genitor code developed by Darrell Whitley, at Colorado State University. After installation, the code developed by Dr. Whitley is located at ~/GA/Genitor, where ~ is a reference to your home directory.
You will need to make almost no modifications to the code. Here's a list of the things you'll need to modify:
- In file ~/GA/Genitor/src/Makefile, change line 25 from 'GaRoot = /home/jdavila/GA/Genitor/' to 'GaRoot = your-home-directory/GA/Genitor/' .
That's it. Now change to the ~/GA/Genitor/src directory and execute 'make'.
Now the basic Genitor system is installed. Do bear in mind that you have to install the version I am providing, since there are some differences from the one originally created by Darrell Whitley. That is, do not download the original Genitor package from somewhere else on the web!
2. Setting up GENDALC.
In the original Genitor package, the one thing you HAD to provide was a fitness evaluation function. In GENDALC, that "simple" function takes some 5000 lines of code, and is located in directory ~/GA/Genitor/Examples/Traditional . I access this directory so often that I created a symbolic link to it at ~/GAS . Make sure that you have that link too, or else the code will not work correctly.
The main addition to Genitor on GENDALC, which takes most of those 5000 lines, is the fact that it takes advantage of a network of connected cpus. This also means that the fitness function, instead of taking a single element and evaluating its fitness, takes a whole population, and evaluates the fitness of all individuals in it.
Here are the things you need to modify in directory ~/GAS to get the code to work for you:
- Line 18 of file constants.h, change '#include "/home/jdavila/GA/Genitor/include/ga/ga_random.h' to '#include "your-home_directory/GA/Genitor/include/ga/ga_random.h"'
- line 9 of file eval_float.c, change '#include "/home/jdavila/GA/Genitor/include/ga/gene2.h"' to '#include "your-home_directory/GA/Genitor/include/ga/gene2.h"',/li>
- line 1 of file home_directory_definition.h, change '#define USERS_HOME_DIRECTORY "/home/jdavila"' to '#define USERS_HOME_DIRECTORY "your-home-directory"'
- line 2 of file home_directory_definition.h, change '#define LENGTH_OF_USERS_HOME_DIRECTORY_STRING 13' to #define LENGTH_OF_USERS_HOME_DIRECTORY_STRING whatever-the-number-of-characters-in-your-home-directory-is'
- line 8 of Makefile, change 'GaRoot = /home/jdavila/GA/Genitor/' to GaRoot = your-home-directory/GA/Genitor/'
- Makefile, line 33: change 'ProgName = Genitor$(DataType)' to 'ProgName = your-first-nameGenitor$(DataType)'. This will allow several instances of GENDALC to be running at the same time, while allowing us to know who it belongs to. That's convenient for each of us to monitor and stopping our processes.You must do this. If I catch any process called GenitorFLOAT running under any account other than mine, I'll stop the process!
- In file GA_CALLER line 2: change 'GenitorFLOAT -c config.file 1>/dev/null 2>/dev/null' to 'your-first-nameGenitorFLOAT -c config.file 1>/dev/null 2>/dev/null'
That's it. Now execute 'make' from the ~/GAS directory .
Now you have a generic system ready. In reality, not generic; rather, it performs whatever task I was evolving NN for when I created the archive you've installed. Later sections of this document deal with how to set up different experiments.
3. Setting up inter-cpu communication
The advantage of using the cluster we have available is that we can evaluate the fitness of several NN at the same time (we can be evaluating one on each cpu of the cluster at any one time). But there is one central program (GenitorFLOAT) that prepares things for each of the cpus, then tells each cpu when to do its thing, and then checks to see which cpus have finished their work, etc.
The central program communicates with the other cpus via the Remote Procedure Call (RPC) protocol. The code we use to do this is in the ~/RPC directory. Although there's a lot of information about RPC that we could go into, here's what you need to know/do:
- You will need to setup both a server and a client. The server will run in each of the cpus. The client will run on the master cpu (which is from where you run GenitorFLOAT). GenitorFLOAT creates the files needed by each cpu, and then runs the client. The only thing the client does is tell the server "hey, go do your thing." That instructs the server to run the program that trains and evaluates the NN.
- All the complexities of the RPC protocol are taken care of by a program called rpcgen. rpcgen will generate almost all of the files we need (those that rpcgen does not generate, I've created for you already).
- Each of you will only need to make one set of code modifications. These modifications, once again, will help all of us know who's doing what at any one time.
- The first modification is to a file in the RPC directory called nn.rpc . The very last line of that file has a number like 0x20000099 in it. Each of us has to have a different number there. Because those numbers have to be unique, I'll give them out to you once you're ready. Otherwise our programs will start to conflict with each other.
- Once I give you that unique number and you modify the file, execute 'rpcgen -k nn.rpc` (note the -k ; it will generate 'Kernighan & Ritchie' c code, which is what I used when I created the system -back then there was nothing else ;-). That will create three files: nn.h, nn_clnt.c, and nn_svc.c . All the protocol-related details we will use are in those three files.
- Now we need to compile the client and server programs. The commands are:
- gcc nn_server.c nn_svc.c -lnsl -o your-first-nameserver and
- gcc nn_client.c nn_clnt.c -lnsl -o client
- Now we need to put a copy of the client program in a special directory for each cpu to find, and also start the server program in each of the cpus. Both of these can be achieved by executing scripts that I've previously created. Both of these scripts depend on your setting an environment variable to a value that indicates which cpus you are using. The command you need to execute is 'export CPU_LIST="master n01 n02 n03 n04 n05 n06 n07 n08 n09 n10 n11 n12 n13" '. I've found it's a good idea to add a line like that to your ~/.bashrc file. That way the command gets executed automatically whenever you log on.
- With variable CPU_LIST containing the correct value, you can put copies of the client programs by going to the ~/RPC directory and executing 'install_clients'. You'll see that there's now a copy of the client program in each of the subdirectories of ~/RPC .
- To start the server programs in each of the cpus, do the following:
- Edit the file ~/turn_server_on, by changing the line that reads '../server & exit' to '../your-first-nameserver & exit' .
- run the command 'turn_hex_on' .
3b. How do the cpus communicate back to the master program?
Once a cpu finishes evaluating a NN, it creates a file called music_ga.log in directory ~/RPC/cpu-name . So each cpu writes into its own directory (and gets its files from its own directory too). The master program simply cycles through these directories, waiting for cpus to write log files into them. When it finds such a file, it knows how to read the fitness value from them. Because the master program assigned a specific genome to that cpu, it knows which genome it is that the cpu just finished evaluating.
3c. What exactly does the server side of all of this do?
Other than write into a few files to let you know it's working, the server program simply calls '$HOME/LANGUAGE/batchman_caller_music_ga', which simply does this:
- /usr/local/snns/batchman -f batchman_music.bat -l temp_music_ga.log
- mv temp_music_ga.log music_ga.log
In other words, it runs the batch version of the SNNS NN simmulator package, with the files created by GenitorFLOAT as input. When it's done with that, it moves the log file it has been writing so that GenitorFLOAT can find it.
Why is that file in a directory called LANGUAGE? Because the first application I made of the GENDALC algorithm evolved NN for language processing.
Why is that file called batchman_caller_music_ga? Because at the time of creating the archive, I was evolving NN for music processing.
Can you rename those directories/files to something that more closely reflects what YOU are doing? Sure. Just make sure that you edit the nn_server.c file accordingly, and recompile the program as before.
4. Changing the parameters of your experiments.
Many of the changes you might want to do to the type of network GENDALC evolves can be done by editing ~/GAS/constants.h . Bear in mind that the general sequence of nodes in each NN is: first the input nodes, then the hidden nodes, then the output nodes. That will be important in determining which is the first hidden node, and which nodes are output nodes.
- If you want to change the number of input nodes, simply change lines 48, and 3 as necessary.
- If you want to change the number of output nodes, simply change line 44.
- If you want to change the number of hidden nodes, change lines 2 & 3, and 44 as necessary.
- If you want to change the number epochs the NN sees the training data, go to the ~/GAS directory, edit eval_float.c, and change line 438. Then execute make' .
- If you want GENDALC to evolve how many hidden layers the NN will have (as opposed to always using 30), comment out line 790 of eval_float.c, tnd then execute make. Warning: make sure you read this paper and that you understand the effect this change would have.
Of course, you could also modify the way that GENDALC turns genotypes into phenotypes. That's a longer task, mostly dependent on what algorithm you come up with. But most of the changes will ge in function put_header() .
5. Monitoring your run while it's executing.
To make sure that your experiment is still running, execute 'ps-A|grep your-first-nameGenitorFLOAT' . If nothing comes back, it's not running.
Sometimes a server in one of the cpus will stop running. This will cause generations to take longer. One way of detecting this is by looking at the file ~/RPC/rollover_log . Each time a generation finishes, GenitorFLOAT writes how many genomes where evaluated by each cpu into this file. If you see a cpu that has been evaluating no genomes for the last few generations, the server probably died. To restart it, execute 'rsh name-of_the-cpu', and then 'ps-A|grep your-first-name' x.If nothing comes back, the server on that cpu died. You can restart it by executing '~/turn_server_on_batch' . In some cases I've found that it's worth killing and restarting a server, probably because it has gone "stale." To kill it, rsh into the cpu, and execute 'killall your-first-nameserver' , then restart it as explained above.
To follow the progress of the current generation, you can look at file ~/GAS/finallog .
Whenever GENDALC finds a genome worth keeping (that is, a fitnes higher than the lowest fitness currently in the population), it stores its initial and final network files into the good_genes directory. They are listed by the number of the generation when they were first introduced into the population.
To stop a running experiment, execute 'killall your-first-nameGenitorFLOAT' . Before you restart another experiment, you need to clean the directory for each of the cpus. Otherwise the next experiment will find intermediate files from the last execution, and think they are current/valid. You can clean the directories by executing ~/RPC/clean_cpus .
The GENDALC system's code is full of diagnostic print/debugging commands, which typically have their output sent to /dev/null, causing the messages to be lost. That's typically the way you want to run things. If you want to do some debugging, you can get all the messaging by running the system from the command line (as opposed to calling it in batch mode with the GA_CALLER command). To do so, execute 'your-first-nameGenitorFLOAT -c config.file' from the command line.
OH YEAH! A little detail :-)
To execute the experiment, go to the ~/GAS directory and execute 'at now < GA_CALLER'
You should never have two of your experiments running at the same time. They will clash with each other, since they'll be writing into the same directories.