Issue with data prep
AIAdventures opened this issue · 3 comments
Hi Jim!
Great project!
I am just having trouble with the prep data moudule.
Running it on linux mint.
andrewcz@andrewcz-PORTEGE-Z30t-B ~/Desktop/Numerai/numerai dataset/numerai_datasets (13)/numerai $ python prep_data.py
/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Fold #1
Traceback (most recent call last):
File "prep_data.py", line 85, in
main()
File "prep_data.py", line 50, in main
rf.fit(X_split_train, y_split_train)
File "/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/ensemble/forest.py", line 247, in fit
X = check_array(X, accept_sparse="csc", dtype=DTYPE)
File "/home/andrewcz/miniconda3/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: 'test'
Many thanks for your help,
Andrew
The data format has changed since last year. There are some columns that need to be dropped.
I used this in tournament 72: feature_cols = ['feature'+str(i) for i in range(1, 22)]
Yes, this code is pretty out of date now. I may update in the future as time allows.
Hey @jimfleming I adapted parts of your code to work with the current format. I'll try sending a PR in the nearby future!