Please look at 'alt_model.ipynb' for the final model implementation. 'baseline-RL-PPO.ipynb' contains a running notebook with the originally provided model for the task.
This repository contains the implementation and fine-tuning of a transformer-based model combined with a Proximal Policy Optimization (PPO) model for generating trade recommendations. The project aims to predict future stock prices and make strategic buy, sell, or hold decisions based on these predictions.
- Introduction
- Setup and Environment
- Model Implementation
- Fine-Tuning
- Evaluation
- Examples of Recommendations
- Notes/Future Work
- Conclusion
This project outlines the development and fine-tuning of a transformer-based model designed to predict future stock prices based on previous trades. Additionally, it integrates a PPO model to make buy, sell, or hold decisions based on these predictions. This two-step approach leverages the predictive power of transformers and the decision-making capabilities of reinforcement learning.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/VaradhKaushik/trade-recommendations.git cd trade-recommendations
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
The implemented model and the baseline PPO implementation are provided in Jupyter Notebooks.
- Implemented Model: The transformer model with PPO is implemented in
alt_model.ipynb
. - Baseline PPO Implementation: The baseline PPO implementation is in
baseline-RL-PPO.ipynb
.
To run the notebooks:
- Start Jupyter Notebook:
jupyter notebook
- Open and run the
alt_model.ipynb
notebook to execute the transformer model with PPO. - Open and run the
baseline-RL-PPO.ipynb
notebook to execute the baseline PPO implementation.
The transformer model processes trade and market data to predict stock prices.
Key Components:
- Input Embeddings: Convert numerical trade features such as price, volume, and time into dense vectors.
- Positional Encodings: Add positional information to retain the order of the data for time-series predictions.
- Encoder Layers: Capture complex patterns and dependencies using self-attention and feed-forward networks.
- Output Layer: Generate the next predicted price based on the processed input sequence.
The PPO model makes strategic buy, sell, or hold decisions based on predicted prices.
Key Components:
- Environment: A custom trading environment simulates stock trading based on the predicted prices.
- PPO Agent: Trained to maximize trading performance by making informed decisions based on the predicted prices.
The transformer model was fine-tuned on the provided dataset to predict stock prices, taking into account the previous 5 trades. Hyperparameters were optimized using GridSearch, resulting in the following best parameters:
- Batch size: 128
- Learning rate: 0.0001
- Number of heads: 5
- Number of layers: 3
- Dimension of feed-forward network: 256
The model's performance was evaluated using different strategies, including a baseline PPO and a DQN model. The Transformer + PPO model demonstrated effective trade execution with competitive portfolio values.
The following are examples of buy and sell actions generated by the Transformer + PPO model:
- Sell Actions:
- Step 59116: Sold at $190.20 for 0.28 shares.
- Step 59117: Sold at $197.81 for 0.26 shares.
- Step 59178: Sold at $201.39 for 0.18 shares.
- Buy Actions:
- Step 59126: Bought at $185.59 for 0.26 shares.
- Step 59129: Bought at $192.22 for 0.18 shares.
- Step 59155: Bought at $195.63 for 0.10 shares.
- The current implementation is an example and should ideally be trained on a much larger dataset to improve the model's reliability.
- Incorporating domain knowledge for more relevant features and outputs can enhance the model's efficacy.
- Exploring the use of an encoder transformer for improved feature representation could be beneficial.
- Implementing robust fail-safes and guardrails to mitigate financial risks is crucial for practical applications.
This project successfully implemented and fine-tuned a transformer-based model combined with a PPO agent for generating trade recommendations. The results demonstrate promising trade execution and decision-making capabilities with potential for further optimization.