al_rnn_results

*the train files structure is: first row is samples number and size of alephbet, the next rows are output. path-len, path *

non safety automata

for automaton blackbox.pdf

image

n_samples RNN Alergia EDSM train_files
100 100.00% 64.52% https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_100.dat
len x 5 97.35% 55.63%
4s 2s
2000 100.00% 80.65% 100.0% https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_2000.dat
len x 5 100.00% 62.25% 100.0%
34.8s 8s 0.1s
5000 100.00% 83.87% https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_5000.dat
len x 5 100.00% 60.26%
1.4m 18s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

heatmap

xls file: values are rnn distance (rnn Tensors euclidean distance) and x/y are states

https://docs.google.com/spreadsheets/d/14pIPymjSUhsbE9W4aEZr1oJjyDuhv8J5tOsHEhbg62w/edit?usp=sharing

when using all hidden layers: https://docs.google.com/spreadsheets/d/1H3IQ-oGA6voaClurlHi0i3Y5fnDgNVeDVorLyVRRVEU/edit?usp=sharing

divide the paths to clusters - states with Kmeans using distances: image

xls file - alergia: values are path length in alergia graph

https://docs.google.com/spreadsheets/d/1fEgc9t4e5TaRPQdKh_7mMmPduvI6EpvpC28SgDLEkUY/edit?usp=sharing


for automaton blackbox.pdf image

n_samples RNN Alergia EDSM train_files
100 68.63% 68.63% https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_100.dat
len x 5 68.16% 70.52%
4.35s 2s
2000 100.00% 76.47% 100.0% https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_2000.dat
len x 5 100.00% 69.72% 100.0%
29.24s 4s 0.1s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

heatmap

xls file: values are rnn distance (rnn Tensors euclidean distance) and x/y are states

https://docs.google.com/spreadsheets/d/1L5VXxH86Lpw69-I1ByWy1O-4Mf9WZBfy3aj_Q9p8cgw/edit?usp=sharing

when using all hidden layers: https://docs.google.com/spreadsheets/d/1uNYfhshW5FqtsiLGN6iSW_8BbBt-svzk5_3-lwTfNfY/edit?usp=sharing

divide the paths to clusters - states with Kmeans using distances: image

xls file - alergia: values are path length in alergia graph

https://docs.google.com/spreadsheets/d/1zXrBxi0-NmJ8mh2TuE4BqN5e1IJuXDr2Hhfm9TkTr0o/edit?usp=sharing


for automaton blackbox.pdf image

(a+b)*abb

n_samples RNN Alergia EDSM train_files
100 64.52% 48.39% https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_100.dat
len x 5 60.26% 55.63%
4.68s 1s
2000 100.00% 80.65% 100.00% https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_2000.dat
len x 5 100.00% 62.25% 100.00%
34.27s 7.26m 0.1s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

heatmap

xls file: values are rnn distance (rnn Tensors euclidean distance) and x/y are states

https://docs.google.com/spreadsheets/d/1KbJ0giggu_DuRWy3Gsva-plMp30cMoqTgwSGt6hM8Sc/edit?usp=sharing

when using all hidden layers: https://docs.google.com/spreadsheets/d/1pc7QO67boBpQzlp93AOoQz1POZrm_3VLUU-ppwYkVLk/edit?usp=sharing

divide the paths to clusters - states with Kmeans using distances: image

xls file - alergia: values are path length in alergia graph

https://docs.google.com/spreadsheets/d/13ylblGndrQcNgo215Fs_UhLTlqSUPy_2OB19dOn9ul4/edit?usp=sharing


for automaton blackbox.pdf image

ϵ|a*|b*a|(a|b)*abb(a|b)*abb

n_samples RNN Alergia EDSM train_files
100 91.00% 90.00% https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_100.dat
len x 5 83.00% 83.00%
6s 2.35m
2000 100.00% 93.00% 100.00% https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_2000.dat
len x 5 100.00% 84.00% 100.00%
1m 9.8h 0.8s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

heatmap


safety automata

for automaton

blackbox.pdf

image

specification.pdf

image

n_samples RNN Alergia train_files
100 54.84% 63.2% https://github.com/pberko/al_rnn_results/blob/main/tracesaut108_100.dat
7s 37s
2000 91.29% 67.73% https://github.com/pberko/al_rnn_results/blob/main/tracesaut108_2000.dat
22s 1m

HEATMAP for rnn (2000 samples)

heatmap


for automaton

blackbox.pdf

image

specification.pdf

image

n_samples RNN Alergia train_files
100 53.23% 65.65% https://github.com/pberko/al_rnn_results/blob/main/tracesaut109_100.dat
3s 1.2m
2000 97.10% 71.36% https://github.com/pberko/al_rnn_results/blob/main/tracesaut109_2000.dat
22s 1m

HEATMAP for rnn (2000 samples)

heatmap


for automaton

blackbox.pdf

image

specification.pdf

image

n_samples RNN Alergia train_files
100 36.45% 52.19% https://github.com/pberko/al_rnn_results/blob/main/tracesaut110_100.dat
3s 5.4m
2000 72.90% 75.04% https://github.com/pberko/al_rnn_results/blob/main/tracesaut110_2000.dat
23s 1m

HEATMAP for rnn (2000 samples)

heatmap


for automaton

blackbox.pdf

image

specification.pdf

image

n_samples RNN Alergia train_files
100 40.65% 48.58% https://github.com/pberko/al_rnn_results/blob/main/tracesaut111_100.dat
4s 5.3m
2000 74.84% 77.49% https://github.com/pberko/al_rnn_results/blob/main/tracesaut111_2000.dat
23s 16s

HEATMAP for rnn (2000 samples)

heatmap


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

RNN

The nn is written in Pytorch (ref I used https://github.com/bentrevett/pytorch-sentiment-analysis)

a. RNN is LSTM with the parameters: EMBEDDING_DIM = 100

    HIDDEN_DIM = 256
    
    OUTPUT_DIM = len(LABEL.vocab)
    
    N_LAYERS = 2
    
    BIDIRECTIONAL = True
    
    DROPOUT = 0.5

b. Loss function is Cross entropy loss (since it's multi class)

from pytorch doumentation: class CrossEntropyLoss(_WeightedLoss): r"""This criterion combines :class:~torch.nn.LogSoftmax and :class:~torch.nn.NLLLoss in one single class.

It is useful when training a classification problem with `C` classes.
If provided, the optional argument :attr:`weight` should be a 1D `Tensor`
assigning weight to each of the classes.
This is particularly useful when you have an unbalanced training set.

c. accuracy function caculates the error according to the differnce between the predicted y and true y:

    def categorical_accuracy_test(preds, y):
        top_pred = preds.argmax(1, keepdim = True)
        correct = (1 - torch.abs((top_pred - y.view_as(top_pred)))/10).sum()
        acc = correct.float() / y.shape[0]
        return acc

d. multiclass predictiom: I chose to divide the train set to 10 classes : 0.0, 0.1 ... 1.0 The nn suppose to predict the true class.

e.