molecularsets/moses

no stereoisomer included in the dataset dataset_v1.csv

Closed this issue · 2 comments

The dataset dataset_v1.csv does not contain any character "/", "" or "@" (stereoisomers).
Why stereoisomers are not included in the dataset?

The current version of a dataset indeed stores non-isomeric SMILES. Seems like a good idea to construct a dataset_v1_isomeric.csv for additional experiments with isomeric SMILES. We'll add it soon, but for now, you can launch prepare_dataset.py script and change this line: https://github.com/molecularsets/moses/blob/master/scripts/prepare_dataset.py#L42 to isomericSmiles=True

Added in #33. We will use this new dataset for benchmarking in the future.