An OpenAI gym environment for futures trading
The futures market is different than a typical stock trading environment, in that contracts move in fixed increments, and each increment (tick) is worth a variable amount depending on the contract traded.
This environment is designed for a single contract - for a single security type.
Accompanying this environment is the concept of a TimeSeriesState
, which is a variable
(n x m) window of all the features representing the state, where n represent the number of steps in the window and m represents the number of features.
This allows you to define the state that your agent uses to determine the action. Some possibillities are:
- An embedding
- A window of the last 20 observed values
- A scalogram from a wavelet transform
- A stack of scalograms
- Etc.
See envs.utils.TimeSeriesState
for more details
This environment accepts 3 different actions at any time state:
0
- a buy -> (long, 1)1
- a hold (no action) -> (idle, 0)2
- a sell -> (short, -1)
The environment does not allow for more than one contract to be traded at a time. If a buy action is submitted with an existing long trade, the action defaults to no action.
This environment can also simulate the probabilistic dynamics of actual market trading
through the fill_probability
and long/short probability vectors. Occasionally,
an order will not fill, or will fill at a price that differs from the intended price. See generate_random_fill_differential
for usage details. If deterministic
behavior is desired, do not supply these arguments
The standard reward function is the net profit per completed trade, where net profit is equal to
((entry_price - exit_price) / tick_size) * value_per_tick)
- execution_cost_per_order * 2
At the end of each episode, the environment reports metrics as well as distributions of profit/loss and trade duration
A list of metrics:
- Total Net Profit
- Intraday Low
- Profit Factor
- Win_Percentage
- Loss Average
- Intraday Low
- Payout Ratio
- Maximum Win
- Average Win
- Maximum Loss
- Expectancy
- Average Loss
- Total Net Profit (shorts)
- Profit Factor (shorts)
- Total Number of Shorts
- Win Percentage (shorts)
- Maximum Win (shorts)
- Average Win (shorts)
- Maximum Loss (shorts)
- Average Loss (shorts)
- Total Net Profit (longs)
- Profit Factor (longs)
- Total Number of Shorts
- Win Percentage (longs)
- Maximum Win (longs)
- Average Win (longs)
- Maximum Loss (longs)
- Average Loss (longs)
- Profit Loss Monotonicity
If necessary, you can extract the raw trade data by calling get_trade_data()
at the end of the epoch. This will allow you to calculate any additional metrics/visualizations. etc. This is cleared at the end of the episode
To use this environment, you will need to do the following:
- Create a custom
TimeSeriesState
by subclassing it and overwriting theget_feature_vector()
method - Convert your data into a set of your custom
TimeSeriesState
objects - Run your episodes
Example:
import pandas as pd
import numpy as np
import torch
from gym_futures import FuturesEnv
data = pd.read_csv("data_es_11_13.csv")
data = data.sort_values(by="time", ascending=True)
states = []
window_size = 5
# create custom TimeSeriesState
class WindowedTimeSeries(TimeSeriesState):
def __init__(self, data, window_size, timestamp_format):
super().__init__(data,timestamp_format=timestamp_format)
self.window_size = window_size
def to_feature_vector(self, *args, **kwargs):
"""A simple tensor"""
data = np.array(self.data.iloc[:, 1], dtype=np.float64)
return torch.tensor(data)
# convert data into states
for i in range(0,len(data)-window_size):
window = data.iloc[i:i+window_size]
state = WindowedTimeSeries(window,
window_size=window_size,
timestamp_format="%Y-%m-%d %H:%M:%S.%f")
states.append(state)
# run your environment
from tqdm import tqdm_notebook as tqdm
import warnings
warnings.filterwarnings("ignore")
episodes = 100
ES = {
"value_per_tick": 12.5,
"minimum_price_movement": 0.25
}
env = FuturesEnv(states=states, value_per_tick=ES["value_per_tick"],
tick_size=ES["minimum_price_movement"], execution_cost_per_order=2.37,
add_current_position_to_state=True)
for e in range(0, episodes):
s = env.reset(e)
while not env.done:
for i in tqdm(range(env.limit)):
# replace np.random.choice with a 'smarter' agent
action = np.random.choice(range(3), size=1)[0]
state, reward, done, info = env.step(action)
reward, metrics = env.render()
print(f"episode:{e}, reward: {reward}")