GridSearchCV with multiple inputs issue
DMTSource opened this issue · 3 comments
I am attempting to switch a working model based on readme_long_example to use a GridSearchCV fit, but when I apply the fit, the gscv does not appear to like my multiple inputs and gives a new error(before I was able to fit and predict my model):
For example:
gscv.fit([X1_train, X2_train, X3_train], Y1_train, **fit_params)
# Console output showing shape of x's/y and the error that appears now when trying to use GridSearchCV.
Preparing baikal model...
X1 Shape: (14206, 478)
X2 Shape: (14206, 14)
X3 Shape: (14206, 508)
Y1 Shape: (14206, 2)
Training baikal model...
Traceback (most recent call last):
File "train.py", line 130, in <module>
main()
File "train.py", line 112, in main
mode='classification')
File "<edited>dir/model.py", line 246, in train
gscv.fit([X1_train, X2_train, X3_train], Y1_train, **fit_params)
File "/home/anaconda3/envs/baikal/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/anaconda3/envs/baikal/lib/python3.7/site-packages/sklearn/model_selection/_search.py", line 759, in fit
X, y, groups = indexable(X, y, groups)
File "/home/anaconda3/envs/baikal/lib/python3.7/site-packages/sklearn/utils/validation.py", line 299, in indexable
check_consistent_length(*result)
File "/home/anaconda3/envs/baikal/lib/python3.7/site-packages/sklearn/utils/validation.py", line 263, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [3, 14206]
How to reproduce it?
I have modified a new example to show this behavior is occurring in the readme_long_example as well. It also gives the error that suggest the input shape is related:
ValueError: Found input variables with inconsistent numbers of samples: [2, 426]
The runnable, modified example can be found here:
https://gist.github.com/DMTSource/2b38b473270a50e71025dd6cb1c03521
What versions are you using?
baikal==0.4.2
scikit-learn==0.24.1
Python 3.7.6 (anaconda env)
Hi Derek,
Thank you for providing the reproducing examples. I think the issue here is that GridSearchCV
as implemented by sklearn is meant for single inputs. I realize that other than the rather obscure comment in SklearnWrapper
quoted below it is not obvious that you cannot pass muti-input/multi-outputs when using SklearnWrapper
+ GridSearchCV
. I'll improve the docs to make this more obvious.
class SKLearnWrapper:
"""Wrapper utility class that allows models to used in scikit-learn's
``GridSearchCV`` API. It follows the style of Keras' own wrapper.
A future release of **baikal** plans to remove this class and instead
include a custom ``GridSearchCV`` API, based on the original scikit-learn
implementation, that can handle baikal models natively.
"""
In the meantime I think you can work around it by merging the multiple inputs (and multiple outputs, if any) before feeding them into the model, and then and doing the splitting within model with Split
and then Stack
-ing the outputs.
Thank you for the quick response! I will give the workaround a try as that sounds like a simple/great solution!
Closing due to inactivity. Feel free to reopen if you need further help.