Carlssonlab/conformalpredictor

issue with the shape of the y-column in the train mode

Opened this issue · 0 comments

Hi,

The (incorrect) shape of "y" column (the one with the labels; obtained after apply amcp_preparation) seems to crash training. For 1M compounds the shape is (100000, 10), while should be (1000000, 1). The error message is:

Traceback (most recent call last):
  File "/home/anaconda3/envs/amcp/bin/amcp", line 33, in <module>
    sys.exit(load_entry_point('amcp', 'console_scripts', 'amcp')())
  File "/home/Programs/conformalpredictor/amcp/amcp.py", line 150, in main
    modes.train(args)
  File "/home/Programs/conformalpredictor/amcp/modes.py", line 84, in train
    X_train, X_calibration_data, y_train, y_calibration_data = train_test_split(X, y, test_size=args.ratioTestSet, shuffle=True, stratify=y)
  File "/home/anaconda3/envs/amcp/lib/python3.8/site-packages/sklearn/model_selection/_split.py", line 2559, in train_test_split
    arrays = indexable(*arrays)
  File "/home/anaconda3/envs/amcp/lib/python3.8/site-packages/sklearn/utils/validation.py", line 443, in indexable
    check_consistent_length(*result)
  File "/home/anaconda3/envs/amcp/lib/python3.8/site-packages/sklearn/utils/validation.py", line 397, in check_consistent_length
    raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [1000000, 10]

A workaround is to replace return X, y with return X, y.reshape(-1, 1) in parseAMCPDataTraining function (parsers.py module).

Cheers,
Alex