Getting canonical smiles directly
Closed this issue · 1 comments
ConstantinWaquet commented
Is there a way of getting the canonical smiles directly from the web service, rather than getting all of the molecule structures data and then extracting the smiles? I waste a lot of time downloading all the redundant information when I only want a small portion of it.
The way I'm currently doing this is like this:
from chembl_webresource_client.new_client import new_client
#get the data (at the moment all of the 'molecule_structures' data)
mols = new_client.molecule.filter(
max_phase = 4,
first_approval__gte = 2000,
molecule_properties__mw_freebase__lte = 500
).only('molecule_structures')
#then I have to use list comprehension to extract the 'canonical_smiles'.
#For the amount of data in this example, it takes about 1:30min to extract the 607 SMILES.
#I'm trying to get a lot more than 607 SMILES, but this ends up taking really long
mol_smiles = [mol['molecule_structures']['canonical_smiles'] for mol in mols if mol['molecule_structures']]
eloyfelix commented
Hi Constantin, sorry for the delayed reply. Retrieving SMILES only is unfortunately not possible and it would be difficult to change on the library due to how it was designed.