/coulomb_matrix

A Python based code to construct a Sorted Coulomb matrix from Smile strings (CSV input) of molecules . An optional scikit-learn is invoked at the end of the script to classify molecules using SVM.

Primary LanguagePython

Coulomb matrix has been developed as a descriptor for molecules, inorder to learn and predict their properties using Machine Learning.

A Python script to construct a sorted coulomb matrix from SMILES string of molecules. The code internally utilizes openbabel to process the chemical data input in the form of SMILES. By default the Sorted Coulomb matrix is saved to a CSV output file containing LabeledPoint vectors optimized to be read by Apache Spark. Apache Spark is particularly optimal for handling big data and comes with built in powerful Machine learning library.

An optional scikit-learn is invoked at the end of the script to classify molecules using SVM.