featurizer RACS() returns arrays containing NaNs
mehradans92 opened this issue · 2 comments
Hi @kjappelbaum. Just noticed that the featurizer RACS()
returns some NaNs when I try to featurize some structures. The CIF files are coming from the QMOF dataset. Not sure if that a potential bug, or where this is coming from. Do you have any suggestions for troubleshooting?
Hi @mehradans92 thanks for the question.
This is a design choice, that I should have perhaps propagated into the public API
It depends on whether you limit your neighbor search or if you let it expand the search radius. If I say I only look for neighbors on the metal node or linker, then there might not be a neighbor n bonds away. You can either set this to zero or to nan
and currently it deals to nan
. If you set the logging level accordingly (i.e. to debug
, you should also see logging messages appear when it runs into those cases.
Note that some parts of this implementation are therefore different from the original one.
This behavior should have been documented a bit better, do you have other ideas on how we could improve it?
Thanks for the explanation @kjappelbaum. I guess if you add an option as kwargs boolean for this it would be helpful. Or maybe you can just add what you described to the documentation. I am trying to understand how RACS() work based on your paper. For some reason, there are many non-zero features that are just constant values across a lot of different chemical structures (CIFs). Btw, the latex equations seem to not render correctly in the documentation. :)