An implementation of recurrent Cascade-Correlation (Cascor) in both NumPy and PyTorch. The goal of the project is to create a modern, user-friendly implementation of recurrent cascor. Additionally, we extend this project to PyTorch in order to explore the possibilities of utilizing GPUs for increased speed for recurrent cascor.
In both Cascor-NumPy/
and Cascor-PyTorch/
, the logic is identical. Therefore this overview will briefly cover the structure of the code to facilitate usage and experimentation for the project.
In CascorNetwork.py, I created a class for the recurrent cascor network. The purpose is to separate out the structure of the code so that training the network will be separated from utilizing the network. For Cascor, the key details are the hyperparameters of the network:
weight_range
- the possible range for the weights in the networkncandidates
- the number of candidates in the candidate poolraw_error
- True, if we do not want to scale the error by the derivative of the output's activation functionhyper_error
- If we use hyperbolic artan error or notscore_threshold
- How close the output needs to be to be counted as correctuse_cache
- If we cache forward-pass values instead of recomputing them all the timeoutput_type
- Output unit typeninputs
- Number of inputsnoutputs
- Number of outputsdataloader
- Contains the data for training / testingmax_units
- Maximum number of units permitted in the network
It also contains the network represented in the arrays:
weights
- array of weights from unit to unitoutputs
- stored outputs after a forward pass
This class does one of the two major parts of training cascor. This is the input-forward pass training where we train our candidate units and find the candidate whose score correlates most strongly with the current error signal. Contains hyperparameters for candidate pool training:
mu
- Parameter for quickpropepsilon
- The amount of linear gradient descent used to update unit input weightsshrink_factor
- Check if step size is too largedecay
- Keeps weights from growing too bigpatience
- Number of allowed consecutive epochs without significant changechange_threshold
- Amount changed required to count as a significant change
This class does most of the heavy lifting. It performs the outer loop of training the output weights after adding a new unit to the network. Additionally, it calls the candidate pool trainer to train the network to completion. Its hyperparameters of mu
, epsilon
, shrink_factor
, decay
, patience
, change_threshold
defined as above. The additional hyperparameters are:
stats
- Keeps track of the epoch and other statistics for the networkoutlimit
- Upper limit on the number of cycles in output phaseinlimit
- Upper limit on the number of cycles in input phase (candidate unit training)rounds
- Upper limit on number of unit-installation cycles
In tester.py, I hacked together a slight proof of correctness to make sure that the network ran the same as it did in the base code. With regards to separating HiddenUnit and OutputUnit from the network definition and the trainer definitions, this would allow for a natural extension where we used mixed units in the candidate pool. An example of testing the code would be in tester.py
with the following commands:
unit_type = SigmoidHiddenUnit()
output_type = SigmoidOutputUnit()
dataloader = Dataloader(training_inputs, training_outputs, use_training_breaks,
training_breaks, test_inputs, test_outputs, use_test_breaks, test_breaks)
network = CascorNetwork(ncandidates, unit_type, output_type, use_cache, score_threshold, dataloader, raw_error,
hyper_error,
noutputs, ninputs, max_units, distribution=np.random.uniform)
stats = CascorStats()
candidate_trainer = CandidateUnitTrainer(network, input_patience, input_change_threshold, input_shrink_factor,
input_mu, input_decay, input_epsilon, stats)
outlimit= 100
inlimit = 100
rounds = 100
ctrainer = CascorTrainer(network, candidate_trainer, outlimit, inlimit, rounds, output_patience, output_epsilon,
output_mu, output_decay, output_deltas, output_slopes, output_prev_slopes, output_shrink_factor,
output_change_threshold, stats, weight_multiplier=1, test_function=None, test=False, restart=False)
In this case, we create our network by initializing the unit_type, output_type, and dataloader, and we call the constructor. In tester.py we are evaluating the network on Morse code, as done in Fahlman's paper on RCC.
- Ian Chiu - NumPy and PyTorch implementation
- Scott Fahlman and Chrisian Lebiere - development of algorithm and architecture, as well as initial implementation of Cascor in Common Lisp
For issues, concerns, or suggestions please contact Ian Chiu