jwallbridge/translob

Issues with TransformerBlock()

Opened this issue · 6 comments

Do you have any example code showing the full model architecture ready for training?

I have been trying to replicate the model to compare against DeepLOB. Everything works up until I try and add the TransformerBlock layers as per the code below and receive an "Invalid input" error. Is this the correct way of implementing the transformer blocks?

def create_translob(T, NF):
    input_lmd = Input(shape=(100, 40))
   
    # Dilated conv block   
    dilated_conv = lob_dilated(input_lmd)
    
    # Layer normalization
    layer_normalize = LayerNormalization()
    layer_norm = layer_normalize(dilated_conv)
    
    # Position encoding
    pos_encode = positional_encoding(layer_norm)
    
    # Transformer block   
    transblock = TransformerBlock(name='transblock', num_heads=3)
    transblock_out1 = transblock(pos_encode)
    transblock_out2 = transblock(transblock_out1)
    
    # MLP, Dropout and Softmax Layer (WIP)
    #out = Dense(3, activation='softmax')(y)
    out = transblock_out2
    
    model = Model(inputs=input_lmd, outputs=out)
    adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1)
    model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

    return model


translob = create_translob(100, 40)

Full error traceback

<ipython-input-78-ea5db9253711> in <module>
     27     return model
     28 
---> 29 translob = create_translob(100, 40)
     30 
     31 from IPython.display import Image

<ipython-input-78-ea5db9253711> in create_translob(T, NF)
     14     # Transformer block
     15     transblock = TransformerBlock(name='transblock', num_heads=3)
---> 16     transblock_out1 = transblock(pos_encode)
     17     transblock2_out2 = transblock(transblock_out1)
     18 

<ipython-input-65-150bbe14e8c8> in __call__(self, _input)
    226 
    227     def __call__(self, _input):
--> 228         output = self.attention_layer(_input)
    229         post_residual1 = (
    230             self.addition_layer([_input, output]))

~\Anaconda3\envs\tf-gpu-2\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in __call__(self, inputs, *args, **kwargs)
    815           # Build layer if applicable (if the `build` method has been
    816           # overridden).
--> 817           self._maybe_build(inputs)
    818           cast_inputs = self._maybe_cast_inputs(inputs)
    819 

~\Anaconda3\envs\tf-gpu-2\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py in _maybe_build(self, inputs)
   2139         # operations.
   2140         with tf_utils.maybe_init_scope(self):
-> 2141           self.build(input_shapes)
   2142       # We must set self.built since user defined build functions are not
   2143       # constrained to set self.built.

<ipython-input-65-150bbe14e8c8> in build(self, input_shape)
     24     def build(self, input_shape):
     25         if not isinstance(input_shape, tuple):
---> 26             raise ValueError('Invalid input')
     27         d_model = input_shape[-1]
     28 

ValueError: Invalid input

Many thanks

@JayCooper95 hello, have you managed to fix it?

Hey don't know if you are still looking at this - I managed to fix these errors and your code sample works fine - the lines that should be changed are:

in Lob attention MultiHeadSelfAttention

def build(self, input_shape): try: d_model = input_shape[-1] except: raise ValueError('Invalid input')
As a quick fix try the try except rather than checking shape. Think there may have been one other place.

@blutooth does it work with the try except? seems strange but ill give a try

@blutooth does it work with the try except? seems strange but ill give a try

yes, worked for me. Recommend this library https://timeseriestransformer.readthedocs.io/en/latest/README.html

@blutooth did you manage to reproduce results from the paper? The best I can get with "optimal" hyperparameters is ~0.5 accuracy. See my attempt here: https://github.com/vslaykovsky/translob

@vslaykovsky I ran your code and tune the parameters. I found that 0.6 accuracy is the best I can reach. I can not find big problem in your implementation