机器学习交易策略搭建研究的全套解决方案 / Building Trading Strategies with Machine Learning in Closed-Loop
Python
trade-learn:Building Trading Strategies with Machine Learning in Closed-Loop
trade-learn is a machine learning strategy development toolkit based on alphalens, backtrader, pyfolio, and quantstats. It provides a complete strategy development process. [ 中文版介绍 ]
The functions it gives including factor collection, factor processing, factor evaluation, causal analysis, model definition, and strategy backtesting, and supports visualization results saved as HTML files for sharing.
Summary of visualizations:
Key Features
Provides stock market data from "Tongdaxin Trading Software" along with 30 proven technical indicators (tdx30) that can be used directly with the Tongdaxin platform.
Offers stock market data from "TradingView," leveraging its advanced data visualization to quickly generate and validate trading insights.
Includes stock market data from "Yahoo Finance" and factor calculation formulas, such as the alpha101 and alpha191 factor sets from WorldQuant LLC.
Provides tools for "Exploratory Analysis" and "Optimal Model Selection" to rapidly identify patterns in the dataset and assess the performance of various models.
Features algorithms for "Causal Graph Construction" and "Causal Feature Selection," extending the gplearn library to support "Feature Derivation" for time series data.
Integrates open-source strategy development components from the Quantopian platform, including tools like empyrical, alphalens, and pyfolio.
Enhances the backtesting.py framework to support portfolio strategy development in addition to single-asset strategies.
Ensures a closed-loop process for machine learning strategy development by eliminating the need for additional third-party packages beyond the user-customized model.
fromtradelearn.queryimportQueryfromtradelearn.strategy.backtestimportBacktest, Strategyfromtradelearn.strategy.evaluateimportEvaluateif__name__=='__main__':
# Obtain asset market data from TradingViewGOOG=Query.history_ohlc(engine='tv', symbol='GOOG', exchange='NASDAQ')
defcrossover(series1, series2):
returnseries1[-2] <series2[-2] andseries1[-1] >series2[-1]
# Define the strategy classclassSmaCross(Strategy):
fast=10slow=20# Compute the indicator data needed for the strategydefinit(self):
defSMA(arr, n):
returnarr.rolling(n).mean()
price=self.data.close.dfself.ma1=self.I(SMA, price, self.fast, overlay=True)
self.ma2=self.I(SMA, price, self.slow, overlay=True)
# Generate trading signals based on the indicators and execute tradesdefnext(self):
ifcrossover(self.ma1, self.ma2):
self.position().close()
self.buy()
elifcrossover(self.ma2, self.ma1):
self.position().close()
self.sell()
# Run the backtest and plot the resultsbt=Backtest(GOOG, SmaCross, cash=1000000, commission=.002, trade_on_close=False)
stats=bt.run()
bt.plot(plot_volume=True, superimpose=True)
# Analyze the backtest resultsEvaluate.analysis_report(stats, GOOG, engine='quantstats')
Start 2014-03-27 00:00:00
End 2024-08-16 00:00:00
Duration 3795 days 00:00:00
Exposure Time [%] 98.509174
Equity Final [$] 233497.861225
Equity Peak [$] 1043778.801501
Return [%] -76.650214
Buy & Hold Return [%] 529.083876
Return (Ann.) [%] -13.163701
Volatility (Ann.) [%] 24.393102
Sharpe Ratio -0.539648
Sortino Ratio -0.680248
Calmar Ratio -0.154556
Max. Drawdown [%] -85.1713
Avg. Drawdown [%] -85.1713
Max. Drawdown Duration 3734 days 00:00:00
Avg. Drawdown Duration 3734 days 00:00:00
# Trades 146
Win Rate [%] 33.561644
Best Trade [%] 20.325583
Worst Trade [%] -15.835971
Avg. Trade [%] -0.991343
Max. Trade Duration 116 days 00:00:00
Avg. Trade Duration 26 days 00:00:00
Profit Factor 0.702201
Expectancy [%] -0.808854
SQN -2.538763
Kelly Criterion -0.272909
_strategy SmaCross
_equity_curve ...
_trades EntryBar E...
_orders Ticke...
_positions {'Asset': -1154,...
_trade_start_bar 19
Further Example
Using machine learning models to build a portfolio:
fromtradelearn.queryimportQueryfromtradelearn.strategy.backtestimportBacktest, Strategyimportpandasaspdfromsklearn.ensembleimportRandomForestClassifierif__name__=='__main__':
# Define a RandomForest indicator class, using predictions to generate trading signals and conduct portfolio backtestingclassRandomForest(Strategy):
definit(self):
# Obtain the raw data and feature setdata=self.data.df.swaplevel(0, 1, axis=1).stack().reset_index(level=1)
fea_list=data.columns.drop(['label', 'code']).tolist()
# Split the training set and train the modeltrain_data=data.query(f"date >= '{tn_begin_date}' and date < '{bt_begin_date}'")
bt_x_train, bt_y_train=train_data[fea_list], train_data['label']
model=RandomForestClassifier(random_state=42, n_jobs=-1)
model.fit(bt_x_train, bt_y_train)
# Predict the probability of price increases for each asset in the portfolio during the backtesting periodtest_data=data.query(f"date >= '{bt_begin_date}' and date < '{bt_end_date}'")
ind_df=pd.DataFrame({'date': data.index.unique()}).set_index('date')
forsymbolintest_data['code'].unique():
bt_x_test=test_data.query(f"code == '{symbol}'")[fea_list]
pre_proba=model.predict_proba(bt_x_test)[:, 1]
ind_df=pd.merge(pd.DataFrame(pre_proba, index=bt_x_test.index, columns=[symbol]),
ind_df, on=['date'], how='right')
# Package the probability predictions as indicators for use in the next methodself.proba=self.I(ind_df, overlay=False)
defnext(self):
# Reset the portfolio's position weightsself.alloc.assume_zero()
# Get the predicted probabilities for each asset on the current dayproba=self.proba.df.iloc[-1]
# Select a subset of assets based on the probability indicator and set position weightsbucket=self.alloc.bucket['equity']
bucket.append(proba.sort_values(ascending=False))
bucket.trim(limit=3)
bucket.weight_explicitly(weight=1/3)
bucket.apply(method='update')
# Update the portfolio's position weightsself.rebalance(cash_reserve=0.1)
# Define the start and end dates for the datatn_begin_date='2017-01-01'tn_end_date='2022-06-22'# Loop through multiple stocks to query historical data and process itrawdata=Noneforiinrange(7):
temp=Query.history_ohlc(symbol='60052'+str(i), start=tn_begin_date, end=tn_end_date, adjust='hfq', engine='tdx')
iftempisNone:
continue# Label the data with price change tagstemp['label'] =temp['close'].pct_change(periods=1).shift(-1).map(lambdax: 1ifx>0else-1)
rawdata=pd.concat([rawdata, temp], axis=0)
# Convert the dataset format and handle missing valuesbtdata=rawdata.pivot_table(index='date', columns='code').swaplevel(0, 1, axis=1)
btdata=btdata.sort_values(by='code', axis=1).fillna(method='ffill')
# Define the start and end dates for the backtestbt_begin_date='2020-01-01'bt_end_date='2022-06-22'# Run the backtest and plot the results, with the default benchmark being an equal-weighted portfoliobt=Backtest(btdata, RandomForest, cash=1000000, commission=.002, trade_on_close=False)
bt.run()
bt.plot(plot_volume=True, superimpose=False, plot_allocation=True)
Start 2017-01-03 00:00:00
End 2022-06-21 00:00:00
Duration 1995 days 00:00:00
Exposure Time [%] 44.83798
Equity Final [$] 515002.86814
Equity Peak [$] 1014662.65544
Return [%] -48.499713
Buy & Hold Return [%] 44.762561
Return (Ann.) [%] -24.465092
Volatility (Ann.) [%] 23.349782
Sharpe Ratio -1.047765
Sortino Ratio -1.083421
Calmar Ratio -0.397371
Max. Drawdown [%] -61.567329
Avg. Drawdown [%] -15.734656
Max. Drawdown Duration 890 days 00:00:00
Avg. Drawdown Duration 225 days 00:00:00
# Trades 1490
Win Rate [%] 47.919463
Best Trade [%] 63.422669
Worst Trade [%] -34.094076
Avg. Trade [%] -0.150202
Max. Trade Duration 98 days 00:00:00
Avg. Trade Duration 8 days 00:00:00
Profit Factor 1.040877
Expectancy [%] 0.082296
SQN -1.906885
Kelly Criterion -0.116659
_strategy RandomForest
_equity_curve ...
_trades EntryBar ...
_orders Tick...
_positions {'600520': 0, '6...
_trade_start_bar 731
dtype: object