al_rnn_results

*the train files structure is: first row is samples number and size of alephbet, the next rows are output. path-len, path *

non safety automata

for automaton blackbox.pdf

n_samples	RNN	Alergia	EDSM	train_files
100	100.00%	64.52%		https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_100.dat
len x 5	97.35%	55.63%
	4s	2s
2000	100.00%	80.65%	100.0%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_2000.dat
len x 5	100.00%	62.25%	100.0%
	34.8s	8s	0.1s
5000	100.00%	83.87%		https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_5000.dat
len x 5	100.00%	60.26%
	1.4m	18s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut112_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

xls file: values are rnn distance (rnn Tensors euclidean distance) and x/y are states

https://docs.google.com/spreadsheets/d/14pIPymjSUhsbE9W4aEZr1oJjyDuhv8J5tOsHEhbg62w/edit?usp=sharing

when using all hidden layers: https://docs.google.com/spreadsheets/d/1H3IQ-oGA6voaClurlHi0i3Y5fnDgNVeDVorLyVRRVEU/edit?usp=sharing

divide the paths to clusters - states with Kmeans using distances:

xls file - alergia: values are path length in alergia graph

https://docs.google.com/spreadsheets/d/1fEgc9t4e5TaRPQdKh_7mMmPduvI6EpvpC28SgDLEkUY/edit?usp=sharing

for automaton blackbox.pdf

n_samples	RNN	Alergia	EDSM	train_files
100	68.63%	68.63%		https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_100.dat
len x 5	68.16%	70.52%
	4.35s	2s
2000	100.00%	76.47%	100.0%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_2000.dat
len x 5	100.00%	69.72%	100.0%
	29.24s	4s	0.1s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut113_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

xls file: values are rnn distance (rnn Tensors euclidean distance) and x/y are states

https://docs.google.com/spreadsheets/d/1L5VXxH86Lpw69-I1ByWy1O-4Mf9WZBfy3aj_Q9p8cgw/edit?usp=sharing

when using all hidden layers: https://docs.google.com/spreadsheets/d/1uNYfhshW5FqtsiLGN6iSW_8BbBt-svzk5_3-lwTfNfY/edit?usp=sharing

divide the paths to clusters - states with Kmeans using distances:

xls file - alergia: values are path length in alergia graph

https://docs.google.com/spreadsheets/d/1zXrBxi0-NmJ8mh2TuE4BqN5e1IJuXDr2Hhfm9TkTr0o/edit?usp=sharing

for automaton blackbox.pdf

(a+b)*abb

n_samples	RNN	Alergia	EDSM	train_files
100	64.52%	48.39%		https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_100.dat
len x 5	60.26%	55.63%
	4.68s	1s
2000	100.00%	80.65%	100.00%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_2000.dat
len x 5	100.00%	62.25%	100.00%
	34.27s	7.26m	0.1s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut114_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

xls file: values are rnn distance (rnn Tensors euclidean distance) and x/y are states

https://docs.google.com/spreadsheets/d/1KbJ0giggu_DuRWy3Gsva-plMp30cMoqTgwSGt6hM8Sc/edit?usp=sharing

when using all hidden layers: https://docs.google.com/spreadsheets/d/1pc7QO67boBpQzlp93AOoQz1POZrm_3VLUU-ppwYkVLk/edit?usp=sharing

divide the paths to clusters - states with Kmeans using distances:

xls file - alergia: values are path length in alergia graph

https://docs.google.com/spreadsheets/d/13ylblGndrQcNgo215Fs_UhLTlqSUPy_2OB19dOn9ul4/edit?usp=sharing

for automaton blackbox.pdf

ϵ|a|ba|(a|b)abb(a|b)abb

n_samples	RNN	Alergia	EDSM	train_files
100	91.00%	90.00%		https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_100.dat
len x 5	83.00%	83.00%
	6s	2.35m
2000	100.00%	93.00%	100.00%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_2000.dat
len x 5	100.00%	84.00%	100.00%
	1m	9.8h	0.8s

https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_100.dat.ff.final.dot.pdf

https://github.com/pberko/al_rnn_results/blob/main/tracesaut115_2000.dat.ff.final.dot.pdf

HEATMAP for rnn (2000 samples)

safety automata

for automaton

blackbox.pdf

specification.pdf

n_samples	RNN	Alergia	train_files
100	54.84%	63.2%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut108_100.dat
	7s	37s
2000	91.29%	67.73%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut108_2000.dat
	22s	1m

HEATMAP for rnn (2000 samples)

for automaton

blackbox.pdf

specification.pdf

n_samples	RNN	Alergia	train_files
100	53.23%	65.65%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut109_100.dat
	3s	1.2m
2000	97.10%	71.36%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut109_2000.dat
	22s	1m

HEATMAP for rnn (2000 samples)

for automaton

blackbox.pdf

specification.pdf

n_samples	RNN	Alergia	train_files
100	36.45%	52.19%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut110_100.dat
	3s	5.4m
2000	72.90%	75.04%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut110_2000.dat
	23s	1m

HEATMAP for rnn (2000 samples)

for automaton

blackbox.pdf

specification.pdf

n_samples	RNN	Alergia	train_files
100	40.65%	48.58%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut111_100.dat
	4s	5.3m
2000	74.84%	77.49%	https://github.com/pberko/al_rnn_results/blob/main/tracesaut111_2000.dat
	23s	16s

HEATMAP for rnn (2000 samples)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

RNN

The nn is written in Pytorch (ref I used https://github.com/bentrevett/pytorch-sentiment-analysis)

a. RNN is LSTM with the parameters: EMBEDDING_DIM = 100

    HIDDEN_DIM = 256
    
    OUTPUT_DIM = len(LABEL.vocab)
    
    N_LAYERS = 2
    
    BIDIRECTIONAL = True
    
    DROPOUT = 0.5

b. Loss function is Cross entropy loss (since it's multi class)

from pytorch doumentation: class CrossEntropyLoss(_WeightedLoss): r"""This criterion combines :class:~torch.nn.LogSoftmax and :class:~torch.nn.NLLLoss in one single class.

It is useful when training a classification problem with `C` classes.
If provided, the optional argument :attr:`weight` should be a 1D `Tensor`
assigning weight to each of the classes.
This is particularly useful when you have an unbalanced training set.

c. accuracy function caculates the error according to the differnce between the predicted y and true y:

    def categorical_accuracy_test(preds, y):
        top_pred = preds.argmax(1, keepdim = True)
        correct = (1 - torch.abs((top_pred - y.view_as(top_pred)))/10).sum()
        acc = correct.float() / y.shape[0]
        return acc

d. multiclass predictiom: I chose to divide the train set to 10 classes : 0.0, 0.1 ... 1.0 The nn suppose to predict the true class.

pberko/al_rnn_results

al_rnn_results

(a+b)*abb

ϵ|a*|b*a|(a|b)*abb(a|b)*abb

ϵ|a|ba|(a|b)abb(a|b)abb