chembl/FPSim2

Problem while creating db file for MACCSKeys

BenjaminNeckam opened this issue · 3 comments

When trying to create db file with "create_db_file('chembl_26.sdf', 'chembl_26_maccs.h5', 'MACCSKeys', mol_id_prop='chembl_id')" I get the following error:

Traceback (most recent call last):
File "/home/benjamin/anaconda3/envs/pure-rdkit-env/lib/python3.6/site-packages/numpy/core/records.py", line 690, in fromrecords
retval = sb.array(recList, dtype=descr)
ValueError: could not assign tuple of length 5 to structure with 4 fields.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/benjamin/anaconda3/envs/pure-rdkit-env/lib/python3.6/site-packages/tables/table.py", line 2221, in append
wbufRA = numpy.rec.array(rows, dtype=self._v_dtype)
File "/home/benjamin/anaconda3/envs/pure-rdkit-env/lib/python3.6/site-packages/numpy/core/records.py", line 854, in array
return fromrecords(obj, dtype=dtype, shape=shape, **kwds)
File "/home/benjamin/anaconda3/envs/pure-rdkit-env/lib/python3.6/site-packages/numpy/core/records.py", line 700, in fromrecords
_array[k] = tuple(recList[k])
ValueError: could not assign tuple of length 5 to structure with 4 fields.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/home/benjamin/anaconda3/envs/pure-rdkit-env/lib/python3.6/site-packages/FPSim2/io/FPSim2_io.py", line 429, in create_db_file
fps_table.append(fps)
File "/home/benjamin/anaconda3/envs/pure-rdkit-env/lib/python3.6/site-packages/tables/table.py", line 2225, in append
"The error was: <%s>" % (str(self), exc))
ValueError: rows parameter cannot be converted into a recarray object compliant with table '/fps (Table(0,), shuffle, blosc(5)) 'Table storing fps''. The error was: <could not assign tuple of length 5 to structure with 4 fields.>

Do I have to add additional parameters in the create_db_file() command or am I missing something else?

I can confirm there is an issue with it and that it will be fixed soon.

Bear in mind that the molecule ids need to be integers and that it won't work with strings.
This is a limitation that we'd like to address in the future but it's not urgent for us since our backends keep also integer ids (molregno) for all the structures.

This is good news, thanks.

Thank you for reminding me, I already found an easy way to bypass this limitation.

there are new conda builds for mac os and linux including this fix, version 0.1.7