/Classification-of-interacting-patterns

Models for predicting interacting residues

Primary LanguageJupyter Notebook

Classification-of-interacting-patterns

Model for predicting interacting residues

Classifying interacting and not-interacting patterns. Each pattern contain 17 amino acids (fixed length pattern). Interacting patterns are shown by label 1 and non-interacting pattern by label 0.

Usage

To install packages (imbalanced-learn, pandas, scipy, numpy, scikit-learn, sklearn )

pip3 install -r requirements.txt

To run python code

python code.py

Data Preprocessing

  • Features and Labels splitting
  • Under-sampling the data to improve Data imbalance
  • Splitting data into train and validation dataset

Feature Extraction

  • Integer Encoding
  • Binary Profiling
  • Molecular_weight

The following models were built

  • Random Forest Classifier
  • Random Forest Regressor
  • Support Vector Machine (SVM)
  • K Nearest Neighbors (KNN)
  • Logistic Regression

Result

Random Forest Regressor comes out to be the best among all the above models.

Accuracy score: 0.66312 (on 70% dataset), 0.64313 (on 30% dataset)

(It was a competition on Kaggle under the course Machine Learning for Biomedical Applications @IIITD , 70% data is used for public leaderboard and 30% data for private leaderboard.)

Team Members : Vidhi Sharma, Ajay Prakash