rdkit/rdkit-js

Properties are not extracted from SDF files

Opened this issue · 3 comments

Describe the bug
I am trying to ingest an SDF file with RDKit JS and am running into an issue where the properties tagged on an SDF file are missing

To Reproduce

Consider the following code:

const molString = ` 
ww   csweb09162414372D 0   0.00000     0.00000
 
  9  9  0  0  0  0  0  0  0  0999 V2000
  137.0000  319.8763    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  108.1690  305.9919    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  101.0483  274.7943    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  121.0000  249.7757    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  153.0000  249.7757    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  172.9517  274.7943    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  165.8310  305.9919    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  242.0000  338.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  280.0000  298.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
  4  5  1  0  0  0  0
  5  6  1  0  0  0  0
  1  7  1  0  0  0  0
  6  7  1  0  0  0  0
  7  8  1  0  0  0  0
  8  9  1  0  0  0  0
M  END

> <some_property>
0

> <PUBCHEM_COMPONENT_COUNT>
1

$$$$
`
const mol = window.RDKit.get_mol("\n" + molSDF + "$$$$\n");
const keys: string[] = mol.get_prop_list();
const metadata = {};
for (var key in keys) {
  console.log("mol." + key + " = " + mol.get_prop(key));
  metadata[key] = mol.get_prop(key);
}
console.log(metadata);

Expected behavior

I would expect to see the properties "some_property" and "PUBCHEM_COMPONENT_COUNT", but instead these are missing.

Screenshots

Output of metadata in the example:
image

Version

This issue was noticed in RDKit version: 2024.03.5

Additional context
This is maybe medium priority for me, my current workaround is just to manually extract the properties outside of RDKit. It would be nice to update get_mol but it would also be even more useful to export functionality from Chem.SDMolSupplier as defined here: https://www.rdkit.org/docs/GettingStartedInPython.html#reading-sets-of-molecules

This is not a bug. get_mol will only parse the molblock up to the M END tag, as it mimics the functionality of the Python function Chem.MolFromMolBlock rather than from Chem.SDMolSupplier.

@ptosco is there an alternative in rdkitjs that would allow me to get the properties?

No, at the moment there isn’t one.