Blockhouse Assignment Report

Author: Filip Michalsky

Preambule: Hi! This was a lot of fun. Thanks for giving me this task. My repo is here.

Executive Summary

In this task, I have implemented a Transformer-based architecture utilizing the PPO training approach. I utilized Google Collab and used L4 GPU with 23GB of RAM. I have compared my results with a vanilla Transformer approach, Blockhouse-provided mean reversion bot (blotter) and a PPO model from stable baselines, as well as a simple momentum strategy. My model beat the PPO model benchmark provided by Blockhouse and the blotter mean reversion simple trading strategy, but underperformed a simple momentum strategy (implemented by me).

Evaluation

I have evaluated all of the algorithms on the held out test set (last 30% of the time sequence for the day).

Vanilla Transfomer: Did not learn to trade
Transformer + PPO - my implementation:
- Cumulative reward: -3174
- Portfolio Value at Market Close: $10,006,779.095
PPO Agent from Stable Baselines:
- Cumulative reward: -2651
- Portfolio Value at Market Close: $10,000,321.125
Simple Mean Reversion (blotter):
- Cumulative reward: -3392
- Portfolio Value at Market Close: $9,941,926.795
Simple Momentum - my implementation:
- Portfolio Value at Market Close: $10,049,549.64

Improvements Potential

I only had ~3 days to implement this while also working full-time. Example directions I would iterate on:

Feature Engineering: Include additional data from the trading history and create lagging indicator features.
Do more hyperparameter and architecture search.
Increase the dataset size.

Also some cool improvement directions from Claude:

Attention Visualization: Implement attention visualization to understand what the model is focusing on when making decisions.
Multi-step Returns: Use multi-step returns instead of single-step returns for more stable learning.
Curiosity-driven Exploration: Implement intrinsic rewards based on prediction error to encourage exploration.
Prioritized Experience Replay: Implement prioritized experience replay to focus on important transitions.
Ensemble Methods: Use an ensemble of models to make more robust predictions.
Curriculum Learning: Start with simpler trading scenarios and gradually increase complexity.
Meta-learning: Implement meta-learning techniques to adapt quickly to market changes.
Risk-aware Objectives: Incorporate risk measures (e.g., Sharpe ratio) directly into the objective function.
Hierarchical RL: Implement a hierarchical structure with high-level strategy and low-level execution agents.
Multi-agent Learning: Extend to multi-agent scenarios to model complex market dynamics.
Adversarial Training: Use adversarial examples to make the model more robust to market manipulations. Interpretability: Implement techniques like SHAP values to explain model decisions.

Technical Approach

I started with a review of the task and the dataset and did some EDA.

The market trades dataset was split into train and test to prevent information leakage. All algorithms benchmarked in this work were only trained on training set (from market open till roughly 3PM) and evaluated on trading data from 3PM till close (test set).

I then started a vanilla transformer implementation, tweaked it to better work with numerical continuous values and did a lot of debugging to prevent exploding gradients.

I then move on to combining the transformer architecture with PPO reinforcement learning strategy - where the actor agent proposes trade recommendations and critic predicts value of them and then they each have a separate loss function (this is not dissimilar from training GANs and looking for saddle points).

Challenges I had to overcome:

Numerical instability: I had to dig deep to set up the right architecture which would not explode gradients on me. I utilized batch layer normalization, gradient clipping, early stopping, learning rate adjustments and robust monitoring to overcome this issue.
Train/test split - this is basic, but the originally assignment was overfitting the PPO model since it included the first 10k training steps in back-testing.
Model "underfitting": transformers are 'data-hungry' models and we are feeding in low-dimensional sequential data.

My final architecture attempted to reduce overfitting by using a more lightweight footprint with less attention heads, small number of layers, encoding actor embedding to 32 dimensional latent vector, not running through the batches multiple times, using bigger batch size (128) for gradient accumulation and using drop out.

Final Model Architecture:

(note that we only use the actor for inference)

TransformerPPOActor(
   (embedding): Sequential(
     (0): Linear(in_features=17, out_features=32, bias=True)
     (1): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
   )
   (transformer_encoder): TransformerEncoder(
     (layers): ModuleList(
       (0-1): 2 x TransformerEncoderLayer(
         (self_attn): MultiheadAttention(
           (out_proj): NonDynamicallyQuantizableLinear(in_features=32, out_features=32, bias=True)
         )
         (linear1): Linear(in_features=32, out_features=2048, bias=True)
         (dropout): Dropout(p=0.3, inplace=False)
         (linear2): Linear(in_features=2048, out_features=32, bias=True)
         (norm1): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
         (norm2): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
         (dropout1): Dropout(p=0.3, inplace=False)
         (dropout2): Dropout(p=0.3, inplace=False)
       )
     )
   )
   (fc): Linear(in_features=32, out_features=3, bias=True)
 ),
 TransformerPPOCritic(
   (embedding): Sequential(
     (0): Linear(in_features=17, out_features=64, bias=True)
     (1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
   )
   (transformer_encoder): TransformerEncoder(
     (layers): ModuleList(
       (0-1): 2 x TransformerEncoderLayer(
         (self_attn): MultiheadAttention(
           (out_proj): NonDynamicallyQuantizableLinear(in_features=64, out_features=64, bias=True)
         )
         (linear1): Linear(in_features=64, out_features=2048, bias=True)
         (dropout): Dropout(p=0.3, inplace=False)
         (linear2): Linear(in_features=2048, out_features=64, bias=True)
         (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
         (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
         (dropout1): Dropout(p=0.3, inplace=False)
         (dropout2): Dropout(p=0.3, inplace=False)
       )
     )
   )
   (fc): Linear(in_features=64, out_features=1, bias=True)
 )

Example recommendations with our PPO+Transformer setup (index is of the test set which is the last 30% of the trade events):

=== Hold Recommendations ===
Trade Recommendation for AAPL at index 2362:
--------------------------------------------------
Input Data:
       Close Volume  RSI      MACD MACD_signal MACD_hist Stoch_k Stoch_d  \
2362  192.28    100  0.0  0.003045    0.005822 -0.002777     0.0     0.0   

           OBV    Upper_BB  Middle_BB    Lower_BB ATR_1       ADX       +DI  \
2362  912052.0  192.298419  192.28725  192.276081  0.01  50.40793  6.955222   

           -DI  CCI  
2362  4.606997  0.0  

Recommendation Probabilities:
Hold: 36.16%, Buy: 28.72%, Sell: 35.12%

Recommended Action: Hold
==================================================
Trade Recommendation for AAPL at index 5195:
--------------------------------------------------
Input Data:
       Close Volume    RSI      MACD MACD_signal MACD_hist    Stoch_k  \
5195  192.23    100  100.0  0.004312     0.00405  0.000262  88.888889   

        Stoch_d       OBV    Upper_BB Middle_BB    Lower_BB ATR_1        ADX  \
5195  96.296296  915926.0  192.236577  192.2235  192.210423  0.02  30.850894   

            +DI       -DI         CCI  
5195  12.699157  2.605216  166.666667  

Recommendation Probabilities:
Hold: 38.43%, Buy: 32.87%, Sell: 28.70%

Recommended Action: Hold
==================================================
Trade Recommendation for AAPL at index 12386:
--------------------------------------------------
Input Data:
        Close Volume  RSI      MACD MACD_signal MACD_hist    Stoch_k  \
12386  192.34    101  0.0 -0.006959   -0.006609 -0.000349  22.222222   

         Stoch_d       OBV    Upper_BB  Middle_BB    Lower_BB ATR_1  \
12386  11.111111  915976.0  192.363342  192.34975  192.336158  0.01   

             ADX       +DI        -DI        CCI  
12386  82.745639  0.151837  14.850535  41.666667  

Recommendation Probabilities:
Hold: 36.85%, Buy: 32.53%, Sell: 30.63%

Recommended Action: Hold
==================================================

=== Buy Recommendations ===
Trade Recommendation for AAPL at index 15453:
--------------------------------------------------
Input Data:
        Close Volume    RSI      MACD MACD_signal MACD_hist Stoch_k  \
15453  192.53     56  100.0  0.003865    0.003049  0.000816   100.0   

         Stoch_d       OBV   Upper_BB Middle_BB   Lower_BB ATR_1        ADX  \
15453  97.222222  911903.0  192.53104  192.5215  192.51196  0.01  99.212838   

            +DI       -DI        CCI  
15453  6.194779  0.000077  41.666667  

Recommendation Probabilities:
Hold: 25.97%, Buy: 38.34%, Sell: 35.69%

Recommended Action: Buy
==================================================
Trade Recommendation for AAPL at index 305:
--------------------------------------------------
Input Data:
     Close Volume   RSI      MACD MACD_signal MACD_hist Stoch_k Stoch_d  \
305  192.3     18  50.0  0.005176    0.006642 -0.001466    50.0    50.0   

          OBV    Upper_BB  Middle_BB    Lower_BB ATR_1        ADX        +DI  \
305  912570.0  192.301931  192.29975  192.297569  0.01  57.146815  17.782686   

          -DI        CCI  
305  2.246297 -55.555556  

Recommendation Probabilities:
Hold: 25.19%, Buy: 38.78%, Sell: 36.04%

Recommended Action: Buy
==================================================
Trade Recommendation for AAPL at index 5074:
--------------------------------------------------
Input Data:
       Close Volume  RSI      MACD MACD_signal MACD_hist Stoch_k Stoch_d  \
5074  192.21    300  0.0  0.000136    0.001949 -0.001813     0.0     0.0   

           OBV    Upper_BB  Middle_BB    Lower_BB ATR_1        ADX       +DI  \
5074  915831.0  192.225896  192.21725  192.208604  0.01  22.732038  5.823186   

           -DI  CCI  
5074  5.146688  0.0  

Recommendation Probabilities:
Hold: 24.18%, Buy: 41.15%, Sell: 34.67%

Recommended Action: Buy
==================================================

=== Sell Recommendations ===
Trade Recommendation for AAPL at index 16060:
--------------------------------------------------
Input Data:
       Close Volume  RSI      MACD MACD_signal MACD_hist Stoch_k Stoch_d  \
16060  192.5    100  NaN -0.002673   -0.003109  0.000435     0.0     0.0   

            OBV    Upper_BB Middle_BB    Lower_BB ATR_1        ADX       +DI  \
16060  911779.0  192.508642  192.5015  192.494358  0.01  59.855898  0.430029   

            -DI  CCI  
16060  1.719474  0.0  

Recommendation Probabilities:
Hold: 28.08%, Buy: 35.56%, Sell: 36.35%

Recommended Action: Sell
==================================================
Trade Recommendation for AAPL at index 2191:
--------------------------------------------------
Input Data:
       Close Volume  RSI     MACD MACD_signal MACD_hist Stoch_k Stoch_d  \
2191  192.25     94  NaN  0.00946    0.011751 -0.002291    50.0    50.0   

           OBV    Upper_BB Middle_BB    Lower_BB ATR_1        ADX       +DI  \
2191  911508.0  192.253859  192.2495  192.245141  0.01  50.652451  11.37401   

          -DI        CCI  
2191  7.00763  55.555556  

Recommendation Probabilities:
Hold: 25.38%, Buy: 37.18%, Sell: 37.44%

Recommended Action: Sell
==================================================
Trade Recommendation for AAPL at index 6564:
--------------------------------------------------
Input Data:
      Close Volume  RSI      MACD MACD_signal MACD_hist Stoch_k Stoch_d  \
6564  192.1     50  NaN -0.003319   -0.004136  0.000816     0.0     0.0   

           OBV    Upper_BB Middle_BB    Lower_BB ATR_1        ADX       +DI  \
6564  917744.0  192.103501  192.1005  192.097499  0.01  73.259896  0.338017   

          -DI  CCI  
6564  3.03933  0.0  

Recommendation Probabilities:
Hold: 30.06%, Buy: 33.83%, Sell: 36.11%

Recommended Action: Sell
==================================================

Happy to discuss the technical details of my approach further, just let me know.

filip-michalsky/transformer-trading

Blockhouse Assignment Report

Executive Summary

Evaluation

Improvements Potential

Technical Approach