Issues with TransformerBlock()
Opened this issue · 6 comments
Do you have any example code showing the full model architecture ready for training?
I have been trying to replicate the model to compare against DeepLOB. Everything works up until I try and add the TransformerBlock layers as per the code below and receive an "Invalid input" error. Is this the correct way of implementing the transformer blocks?
def create_translob(T, NF):
input_lmd = Input(shape=(100, 40))
# Dilated conv block
dilated_conv = lob_dilated(input_lmd)
# Layer normalization
layer_normalize = LayerNormalization()
layer_norm = layer_normalize(dilated_conv)
# Position encoding
pos_encode = positional_encoding(layer_norm)
# Transformer block
transblock = TransformerBlock(name='transblock', num_heads=3)
transblock_out1 = transblock(pos_encode)
transblock_out2 = transblock(transblock_out1)
# MLP, Dropout and Softmax Layer (WIP)
#out = Dense(3, activation='softmax')(y)
out = transblock_out2
model = Model(inputs=input_lmd, outputs=out)
adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
return model
translob = create_translob(100, 40)
Full error traceback
<ipython-input-78-ea5db9253711> in <module>
27 return model
28
---> 29 translob = create_translob(100, 40)
30
31 from IPython.display import Image
<ipython-input-78-ea5db9253711> in create_translob(T, NF)
14 # Transformer block
15 transblock = TransformerBlock(name='transblock', num_heads=3)
---> 16 transblock_out1 = transblock(pos_encode)
17 transblock2_out2 = transblock(transblock_out1)
18
<ipython-input-65-150bbe14e8c8> in __call__(self, _input)
226
227 def __call__(self, _input):
--> 228 output = self.attention_layer(_input)
229 post_residual1 = (
230 self.addition_layer([_input, output]))
~\Anaconda3\envs\tf-gpu-2\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
815 # Build layer if applicable (if the `build` method has been
816 # overridden).
--> 817 self._maybe_build(inputs)
818 cast_inputs = self._maybe_cast_inputs(inputs)
819
~\Anaconda3\envs\tf-gpu-2\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in _maybe_build(self, inputs)
2139 # operations.
2140 with tf_utils.maybe_init_scope(self):
-> 2141 self.build(input_shapes)
2142 # We must set self.built since user defined build functions are not
2143 # constrained to set self.built.
<ipython-input-65-150bbe14e8c8> in build(self, input_shape)
24 def build(self, input_shape):
25 if not isinstance(input_shape, tuple):
---> 26 raise ValueError('Invalid input')
27 d_model = input_shape[-1]
28
ValueError: Invalid input
Many thanks
@JayCooper95 hello, have you managed to fix it?
Hey don't know if you are still looking at this - I managed to fix these errors and your code sample works fine - the lines that should be changed are:
in Lob attention MultiHeadSelfAttention
def build(self, input_shape): try: d_model = input_shape[-1] except: raise ValueError('Invalid input')
As a quick fix try the try except rather than checking shape. Think there may have been one other place.
@blutooth does it work with the try except? seems strange but ill give a try
yes, worked for me. Recommend this library https://timeseriestransformer.readthedocs.io/en/latest/README.html
@blutooth did you manage to reproduce results from the paper? The best I can get with "optimal" hyperparameters is ~0.5 accuracy. See my attempt here: https://github.com/vslaykovsky/translob
@vslaykovsky I ran your code and tune the parameters. I found that 0.6 accuracy is the best I can reach. I can not find big problem in your implementation