chembl/chembl_webresource_client

Getting canonical smiles directly

Closed this issue · 1 comments

Is there a way of getting the canonical smiles directly from the web service, rather than getting all of the molecule structures data and then extracting the smiles? I waste a lot of time downloading all the redundant information when I only want a small portion of it.

The way I'm currently doing this is like this:

from chembl_webresource_client.new_client import new_client

#get the data (at the moment all of the 'molecule_structures' data)

mols = new_client.molecule.filter(
    max_phase = 4, 
    first_approval__gte = 2000,
    molecule_properties__mw_freebase__lte = 500
    ).only('molecule_structures')

#then I have to use list comprehension to extract the 'canonical_smiles'. 
#For the amount of data in this example, it takes about 1:30min to extract the 607 SMILES. 
#I'm trying to get a lot more than 607 SMILES, but this ends up taking really long

mol_smiles = [mol['molecule_structures']['canonical_smiles'] for mol in mols if mol['molecule_structures']]

Hi Constantin, sorry for the delayed reply. Retrieving SMILES only is unfortunately not possible and it would be difficult to change on the library due to how it was designed.