Please look at 'alt_model.ipynb' for the final model implementation. 'baseline-RL-PPO.ipynb' contains a running notebook with the originally provided model for the task.

Trade Recommendations

This repository contains the implementation and fine-tuning of a transformer-based model combined with a Proximal Policy Optimization (PPO) model for generating trade recommendations. The project aims to predict future stock prices and make strategic buy, sell, or hold decisions based on these predictions.

Introduction
Setup and Environment
Model Implementation
Fine-Tuning
Evaluation
Examples of Recommendations
Notes/Future Work
Conclusion

Introduction

This project outlines the development and fine-tuning of a transformer-based model designed to predict future stock prices based on previous trades. Additionally, it integrates a PPO model to make buy, sell, or hold decisions based on these predictions. This two-step approach leverages the predictive power of transformers and the decision-making capabilities of reinforcement learning.

Setup and Environment

To set up the project locally, follow these steps:

Clone the repository:

git clone https://github.com/VaradhKaushik/trade-recommendations.git
cd trade-recommendations

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

How to Run

The implemented model and the baseline PPO implementation are provided in Jupyter Notebooks.

Implemented Model: The transformer model with PPO is implemented in alt_model.ipynb.
Baseline PPO Implementation: The baseline PPO implementation is in baseline-RL-PPO.ipynb.

To run the notebooks:

Start Jupyter Notebook:
```
jupyter notebook
```
Open and run the alt_model.ipynb notebook to execute the transformer model with PPO.
Open and run the baseline-RL-PPO.ipynb notebook to execute the baseline PPO implementation.

Model Implementation

Transformer Model for Price Prediction

The transformer model processes trade and market data to predict stock prices.

Key Components:

Input Embeddings: Convert numerical trade features such as price, volume, and time into dense vectors.
Positional Encodings: Add positional information to retain the order of the data for time-series predictions.
Encoder Layers: Capture complex patterns and dependencies using self-attention and feed-forward networks.
Output Layer: Generate the next predicted price based on the processed input sequence.

PPO Model for Decision Making

The PPO model makes strategic buy, sell, or hold decisions based on predicted prices.

Key Components:

Environment: A custom trading environment simulates stock trading based on the predicted prices.
PPO Agent: Trained to maximize trading performance by making informed decisions based on the predicted prices.

Fine-Tuning

The transformer model was fine-tuned on the provided dataset to predict stock prices, taking into account the previous 5 trades. Hyperparameters were optimized using GridSearch, resulting in the following best parameters:

Batch size: 128
Learning rate: 0.0001
Number of heads: 5
Number of layers: 3
Dimension of feed-forward network: 256

Evaluation

The model's performance was evaluated using different strategies, including a baseline PPO and a DQN model. The Transformer + PPO model demonstrated effective trade execution with competitive portfolio values.

Examples of Recommendations

The following are examples of buy and sell actions generated by the Transformer + PPO model:

Sell Actions:
- Step 59116: Sold at $190.20 for 0.28 shares.
- Step 59117: Sold at $197.81 for 0.26 shares.
- Step 59178: Sold at $201.39 for 0.18 shares.
Buy Actions:
- Step 59126: Bought at $185.59 for 0.26 shares.
- Step 59129: Bought at $192.22 for 0.18 shares.
- Step 59155: Bought at $195.63 for 0.10 shares.

Notes/Future Work

The current implementation is an example and should ideally be trained on a much larger dataset to improve the model's reliability.
Incorporating domain knowledge for more relevant features and outputs can enhance the model's efficacy.
Exploring the use of an encoder transformer for improved feature representation could be beneficial.
Implementing robust fail-safes and guardrails to mitigate financial risks is crucial for practical applications.

Conclusion

This project successfully implemented and fine-tuned a transformer-based model combined with a PPO agent for generating trade recommendations. The results demonstrate promising trade execution and decision-making capabilities with potential for further optimization.

VaradhKaushik/trade-recommendations