Using a convolutional neural network designed for image recognition to classify drugs from SDF structural data. This data specifies the 3D position of every atom in a molecule.
The two classes to be distinguished are stimulants and sedatives.
Accumulating a good data set is very difficult. Therefore, fitting will be done with only 123 molecules.
80% of the data will be used for training. 20% will be used to test the accuracy of the neural network.
Confusion matrix of predicted classes versus actual classes.
Sedative | Stimulant | |
---|---|---|
Sedative | 8 | 2 |
Stimulant | 1 | 14 |
Training Accuracy = 0.908
Testing Accuracy = 0.88 (this value varies depending how the data is shuffled).
Expected accuracy from guessing = 0.64 (due to imbalanced data set)
However, this varies depending on the distribution of training and testing data. Accuracy can go as high as 96% when the random number generator seed is set to 1.
- Cristian Groza
- Alexei Nolin-Lapalme