Generate dataset of problems like the Regex Golf game. This allows training RNNs to play regex golf.
regldg, and torch-rnn for trying to learn
wget https://regldg.com/regldg-1.0.0.tar.gz
tar -zxvf regldg-1.0.0.tar.gz
cd regldg-1.0.0
make
g++ rand_text_generator.cpp -o rand_text_generator.bin
./rand_text_generator.bin 4 foo
./regldg '--max-length=8' '--readable-output' '--num-words-output=1000' "[a-z]{3}" | shuf | head
while read regex; do ./rand_text_generator.bin 4 $regex; echo SOLUTION: $regex; done< <(./regldg '--max-length=8' '--readable-output' '--num-words-output=1000000' "[a-z]{3}" | shuf | head -17576) > foo_dataset
train with torch-rnn
python scripts/preprocess.py --input_txt foo_dataset --output_h5 regex_data.h5 --output_json regex_data.json
th train.lua -checkpoint_every 10000 -max_epochs 25000 -dropout 0.5 -seq_length 40 -num_layers 2 -batch_size 1000 -input_h5 regex_data.h5 -input_json regex_data.json
Because the network doesn't learn, it shows that RNNs do not learn rules which are not character-specific