ncats/ncats-adme

Cannot parse SMILES strings containing positive charge

Closed this issue · 2 comments

Example: Niclosamide
Input SMILES: OC1=C(C=C(Cl)C=C1)C(=O)NC2=CC=C(C=C2Cl)[N+]([O-])=O
Error: SMILES Parse Error: syntax error while parsing: C1=CC(O)=C(C(=O)NC2=C(Cl)C=C([N

Cause: In Angular 5.2.7+ the "+" is replaced with space " " in a query string (ref). So, SMILES strings with a + character are decoded with a space (%20 in the http request URL) which is why RDKit fails to parse the decoded SMILES. For example, the current behavior is:

Query SMILES: OC1C=CC(Cl)=CC=1C(NC1C(Cl)=CC([N+]([O-])=O)=CC=1)=O
Decoded SMILES: OC1C=CC(Cl)=CC=1C(NC1C(Cl)=CC([N ]([O-])=O)=CC=1)=O

Solution: encode the + characters in SMILES as %2B and decode them back to + characters.

Fixed in #63