molview/molview

2d structures get messed up by the clean function, unless they have defined stereocenters

TowelBin opened this issue · 7 comments

example:
https://molview.org/?smiles=IC1=CC2S(=O)(=O)N(C3=C(C(N(CCCOC)[H])C=2C=C1)SC=C3)C
https://molview.org/?smiles=IC1C=CC2=C(S(N(C)C3C=CSC=3[C@@]2N([H])CCCOC)(=O)=O)C=1
it's nice that it there's an easy workaround for this (defining the stereocenters), but it took me a long time of using molview to find this out. Chemistry beginners also might not be able to easily go through and mark stereocenters.

P.S: is the molview rewrite still being developed?

On the same line, I'd love to know how the 2D to 3D conversion is done under the hood. I guess in the end there is some more fundamental library called, but I can't seem to find it. RDkit?

@TowelBin @rlaplaza Last summer I have spent quite some time on the rewrite, but I haven't yet released it anywhere and sadly I am now too busy to work on it. I have encountered various 3D resolving issues over the years that I plan to fix in this rewrite. The computation of 3D coordinates does not happen on the client-side, but instead by looking up the structure in an external database using an identifier (currently SMILES). Currently PubChem is used as primary source, and the Chemical Identifier Resolver 1 as fallback. I picked PubChem as primary source because its uptime is reliable and its API is robust, so PubChem answers almost all queries.

Since then I learned that PubChem uses force-field based algorithms to compute 3D coordinates for almost all its entries (I have non-organic examples where this produces very wrong results), while the Chemical Identifier Resolver relies on a proprietary program called CORINA to put together known fragments (for educational purposes this is often more accurate, in particular for non-organic structures). So if the rewrite ever sees the light of day, it will rely on the Chemical Identifier Resolver as primary source. It may be possible by then to do live force-field computations in the browser, but this is not a priority (and a database will be needed for non-organic structures).

For 2D cleaning I implemented https://github.com/actelion/openchemlib in the rewrite, which hopefully handles special cases more gracefully. The current 2D cleaning function just retrieves a 2D coordinate file from PubChem by computing the SMILES string for the sketch.

By the way, as long as there is some support through Patreon I will continue to develop the rewrite, albeit very slowly. So I plan to again find time and motivation to work on it.

@bergwerf Why don't you make a repository for the rewrite? I'm considering supporting this project on patreon but I'd like to have some idea of progress...

@TowelBin I haven't made progress since this summer because I am completely occupied by my masters. I do intend to continue this summer, perhaps by slimming down some of the objectives. I do not want to create a repository yet because I can see someone forking it to finish the work without following the design philosophy that I have in mind. I am quite particular about software design, so I want to finish this myself, and then make it open source so others can see how it works (although in the new version the sketcher will be closed source, since I have been developing a new version in cooperation with a company, this will support Lewis dot structures though!)

Its super important for the sketcher to be open source! I don't think there are any modern open source sketchers. Even the ones that are proprietary are outdated. At the very least, could there be an option to use the old open source sketcher? I'm not very excited about lewis dot structures : \

I am closing this issue because I am not planning to resolve it.