/QSAR

Final project of EECS 6690 @ CU

Primary LanguageR

Overview

Final project of EECS 6690 Statistical learning @ Columbia University.

QSAR (Quantitative Structure-Activity Relationships) is used to predict the biodegradability of chemicals. QSAR biodegradation data set was built to develop QSAR models for studying the relationship between chemical structure and biodegradability of molecules.

File Structure

 ๐Ÿ“ฆQSAR
 โ”ฃ ๐Ÿ“‚data
 โ”ƒ โ”— ๐Ÿ“œbiodeg.csv        
 โ”ฃ ๐Ÿ“‚materials
 โ”ƒ โ”ฃ ๐Ÿ“‚pics                         # pics
 โ”ƒ โ”— ๐Ÿ“œ presentation_v2.pptx        # slides
 โ”ƒ โ”— ๐Ÿ“œ presentation_v2.pdf         # slides pdf
 โ”ƒ โ”— ๐Ÿ“œ final_paper_v1.pdf          # Final Report
 โ”ฃ ๐Ÿ“‚R code
 โ”ƒ โ”ฃ ๐Ÿ“œtree.Rmd                     # Decision Tree Model using rpart
 โ”ƒ โ”ฃ ๐Ÿ“œread_data.Rmd                # Our implementation
 โ”ƒ โ”ฃ ๐Ÿ“œplot.Rmd                     # Result visualization
 โ”ƒ โ”— ๐Ÿ“œ6690_proj_algorithm.R        # Reproduce paper method, Adaboost, NN and consensus model
 โ”ƒ โ”— ๐Ÿ“œ6690_proj_algorithm.Rmd      # Rmd version of 6690_proj_algorithm.R
 โ”ฃ ๐Ÿ“œ.gitignore
 โ”ฃ ๐Ÿ“œQSAR.Rproj
 โ”— ๐Ÿ“œREADME.md

Presentation slides: pptx pdf

Final report: paper

Data Set: Data Set

Result pics: pictures

Data Set Description

  • Number of Instances: 1055

    • 356 molecules are ready biodegradable (RB) and 699 are not ready biodegradable (NRB)
  • Number of Attributes: 41

    • selected using many classification modeling methods combined with genetic algorithms
  • Correlationships

Reproduce

  • KNN
  • PLSDA
  • SVM

Implementation

  • LDA
  • Naive Bayes
  • Decision Tree
  • Bagging
  • RandomForest
  • Adaboost
  • Neural Network
  • Consensus Model

Conclusion

โ…  Individual model

โ…ก Consensus Model

Citation

Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V. (2013). Quantitative Structure - Activity Relationship models for ready biodegradability of chemicals. Journal of Chemical Information and Modeling, 53, 867-878

Our Team