MStarmans91/WORC

fastr - found HDF5, which is not found in the typelist

Closed this issue · 8 comments

When I use fastr for the trainclassifier function, I get the following error:

File "", line 1, in
trainclassifier.trainclassifier(test, patientinfo, config, output_hdf, output_json)

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\PREDICT\trainclassifier.py", line 226, in trainclassifier
tempsave=config['General']['tempsave'])

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\PREDICT\classification\crossval.py", line 295, in crossval
**config['HyperOptimization'])

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\PREDICT\classification\parameter_optimization.py", line 82, in random_search_parameters
random_search.fit(features, labels)

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\PREDICT\processing\SearchCV.py", line 1628, in fit
return self._fit(X, y, groups, sampled_params)

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\PREDICT\processing\SearchCV.py", line 1319, in fit
estimator_data = network.create_source('HDF5', id
='estimator_source')

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\fastr\core\network.py", line 543, in create_source
source_node = SourceNode(datatype=datatype, id_=id_, parent=self, nodegroup=nodegroup)

File "C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\fastr\core\node.py", line 1243, in init
raise exceptions.FastrValueError(message)

FastrValueError: [fastr:///networks/PREDICT_GridSearch_E7F5941TN9/0.0/nodelist/estimator_source] Unknown DataType for SourceNode fastr:///networks/PREDICT_GridSearch_E7F5941TN9/0.0/nodelist/estimator_source (found HDF5, which is not found in the typelist)!

There was no config file in my $HOME/.fastr/ folder, but also after putting it there ($HOME/.fastr/config.d/PREDICT_config.py), I got the same error.

When using fastr, all data has to have a specific type, so it can check whether sinks, sources and nodes that are connected match. These types have to be in defined in fastr, see also https://fastr.readthedocs.io/en/stable/static/user_manual.html#datatypes .

In WORC, we have added several datatypes. Although the error you get is in PREDICT, it needs a datatype we defined in WORC (yes, that's a bit inconvenient). Hence, you need to tell fastr where these files are located. They are in a specific folder in WORC: https://github.com/MStarmans91/WORC/tree/master/WORC/resources/fastr_types . You can also see the HDF5.py there, which fastr apparantly cannot find. To tell fastr to look here, WORC should install a configuration file into the fastr config folder. I guess that one is missing.

Your fastr config folder is in your home folder: $HOME/.fastr, which is hidden folder. In there, there could be a general config.py. Also, there should be a config.d folder in there, where the WORC and PREDICT configs should be located.

Can you try putting the WORC config (https://github.com/MStarmans91/WORC/tree/master/WORC/fastrconfig) also in the config.d folder? You can check whether fastr picks this up by importing fastr and printing fastr.config.types_path. The installation folders of both WORC and PREDICT should be in there.

Let me know when you put the config there what the output of fastr.config.types_path is, so we can check if at least the config is parsed correctly.

Thanks! I put both PREDICT and WORC config in the config.d folder and this is the output for fastr.config.types_path:

['C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\fastr\resources\datatypes']

Apparently, these files in the config.d folder are not parsed by fastr. Otherwise, 'C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages\WORC\resources\fastr_types' would also have been in that list.

Just a quick check: do you have fastr>=2.0.0? The config.d feature was introduced in 2.0.0. You should have, as WORC requires this, but just to be sure.

This is more a fastr issue than a WORC issue at this point. I guess fastr is not parsing this folder at all. You could try to pragmatically solve this by putting a config.py in the .fastr folder and putting the content from both the PREDICT_config.py and the WORC_config.py files in there. But maybe we can check where fastr is looking for your config, as it can be in multiple locations, see https://fastr.readthedocs.io/en/stable/static/file_description.html#config-file.

Can you check whether you have a different fastr home folder? Just open a command window (Windows+R button, type cmd'' and press enter) and run echo $FASTRHOME''. If that is defined, than you have to define a config.d folder in there and put the WORC and PREDICT configs in there.

Hope that one of those options solves the issue.

Yes, I have fastr version 2.1.1.

I tried putting the content of both into one config file, but that didn't work.

I'm not sure what you mean with the second option; when I type echo $FASTRHOME, it just returns '$FASTRHOME'?

Sorry, on Windows I guess you have to do echo %FASTRHOME% .

If that returns %FASTRHOME% as well, than the variable does not exist. In that case, fastr should use the ~/.fastr folder to detect the config and we have to look for a different issue.

Ok, unfortunately this also returns %FASTRHOME%.

But I did just notice that there are additional lines to the error, since I put the config files there:

[WARNING] basemanager:0287 >> Cannot scan C:\Users\Gebruiker\Anaconda3\envs\py27\WORC\resources\fastr_tools with ToolManager, path does not exist!
[WARNING] basemanager:0287 >> Cannot scan C:\Users\Gebruiker\Anaconda3\envs\py27\PREDICT\fastr_tools with ToolManager, path does not exist!
[WARNING] basemanager:0287 >> Cannot scan C:\Users\Gebruiker\Anaconda3\envs\py27\PREDICT\fastr_tools with ToolManager, path does not exist!
[WARNING] basemanager:0287 >> Cannot scan C:\Users\Gebruiker\Anaconda3\envs\py27\WORC\resources\fastr_types with DataTypeManager, path does not exist!
[CRITICAL] node:1242 >> Unknown DataType for SourceNode fastr:///networks/PREDICT_GridSearch_OY24WX42B8/0.0/nodelist/estimator_source (found HDF5, which is not found in the typelist)!

I think it should be looking in C:\Users\Gebruiker\Anaconda3\envs\py27\Lib\site-packages? How can I change this?

Ok, that sheds a different light on the issue. Fastr does parse your configs in the config.d, but the paths on which it should look are indeed incorrect. If you look into the WORC_config.py and PREDICT_config.py, you see that we automatically try to locate the packagedir, which indeed in your case will probably be C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages as PREDICT is installed in there.

Probably this is because you are on Windows. I will try to find a method that works on Windows: for now, in those configs, just define the packagedir manually before tools_path and types_path are altered, e.g. for the PREDICT_config.py:

packagedir = 'C:\Users\Gebruiker\Anaconda3\envs\py27\lib\site-packages' 
tools_path = [os.path.join(packagedir, 'PREDICT', 'fastr_tools')] + tools_path

Yes it works now! Thanks a lot!!