/gpquant

Mining volume and price factors by genetic algorithm

Primary LanguagePython

gpquant说明文档

介绍

gpquant是对Python的遗传算法包gplearn的一个改造,用于进行因子挖掘

模块

Function

计算因子的函数,用仿函数类Function实现了23个基本函数和37个时间序列函数。所有的函数本质上都是标量函数,但因为采用了向量化计算,所以输入和输出都是向量形式

Fitness

适应度评价指标,用仿函数类Fitness实现了几个适应度函数,主要是应用其中的夏普比率sharpe_ratio

Backtester

向量化的因子回测框架,逻辑是先根据定义的策略函数把拿到的因子factor变成信号signal,再通过信号处理函数把信号signal变成资产asset实现回测,这两步统一在仿函数Backtester类里实现

SyntaxTree

公式树,把因子的计算公式写成前缀表达式,然后用公式树SyntaxTree表示。每一个公式树代表一个因子,由节点Node构成;每个Node存放了自身数据、父节点和子节点。节点的自身数据可以是Function、变量、常量,或者时间序列常数

公式树可以交叉crossover、子树突变subtree_mutate、提升突变hoist_mutate、点突变point_mutate或者繁殖reproduce(逻辑可参照gplearn)

SymbolicRegressor

符号回归类,gpquant因子挖掘本质上是用遗传算法解决符号回归问题,其中定义了遗传过程中的一些参数,如种群数量population_size、遗传代数generations等

使用

导入

下载gpquant包(pip install gpquant),导入SymbolicRegressor类

from gpquant.SymbolicRegressor import SymbolicRegressor

测试

跟gplearn一样的例子,把$y=X_0^2 - X_1^2 + X_1 - 1$对$X_0$和$X_1$进行符号回归,大约在第9代能找到正确答案

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.utils import *
from gpquant.SymbolicRegressor import SymbolicRegressor


# Step 1
x0 = np.arange(-1, 1, 1/10.)
x1 = np.arange(-1, 1, 1/10.)
x0, x1 = np.meshgrid(x0, x1)
y_truth = x0**2 - x1**2 + x1 - 1

ax = plt.figure().gca(projection='3d')
ax.set_xlim(-1, 1)
ax.set_ylim(-1, 1)
surf = ax.plot_surface(x0, x1, y_truth, rstride=1, cstride=1,
                       color='green', alpha=0.5)
plt.show()

# Step 2
rng = check_random_state(0)

# training samples
X_train = rng.uniform(-1, 1, 100).reshape(50, 2)
y_train = X_train[:, 0]**2 - X_train[:, 1]**2 + X_train[:, 1] - 1
X_train = pd.DataFrame(X_train, columns=['X0', 'X1'])
y_train = pd.Series(y_train)

# testing samples
X_test = rng.uniform(-1, 1, 100).reshape(50, 2)
y_test = X_test[:, 0]**2 - X_test[:, 1]**2 + X_test[:, 1] - 1

# Step 3
sr = SymbolicRegressor(population_size = 2000,
                       tournament_size = 20,
                       generations = 20,
                       stopping_criteria = 0.01,
                       p_crossover = 0.7,
                       p_subtree_mutate = 0.1,
                       p_hoist_mutate = 0.1,
                       p_point_mutate = 0.05,
                       init_depth = (6, 8),
                       init_method = 'half and half',
                       function_set = ['add', 'sub', 'mul', 'div', 'square'],
                       variable_set = ['X0', 'X1'],
                       const_range = (0, 1),
                       ts_const_range = (0, 1),
                       build_preference = [0.75, 0.75],
                       metric = 'mean absolute error',
                       parsimony_coefficient = 0.01)

sr.fit(X_train, y_train)

# Step 4
print(sr.best_estimator)