Final project of EECS 6690 Statistical learning @ Columbia University.
QSAR (Quantitative Structure-Activity Relationships) is used to predict the biodegradability of chemicals. QSAR biodegradation data set was built to develop QSAR models for studying the relationship between chemical structure and biodegradability of molecules.
๐ฆQSAR
โฃ ๐data
โ โ ๐biodeg.csv
โฃ ๐materials
โ โฃ ๐pics # pics
โ โ ๐ presentation_v2.pptx # slides
โ โ ๐ presentation_v2.pdf # slides pdf
โ โ ๐ final_paper_v1.pdf # Final Report
โฃ ๐R code
โ โฃ ๐tree.Rmd # Decision Tree Model using rpart
โ โฃ ๐read_data.Rmd # Our implementation
โ โฃ ๐plot.Rmd # Result visualization
โ โ ๐6690_proj_algorithm.R # Reproduce paper method, Adaboost, NN and consensus model
โ โ ๐6690_proj_algorithm.Rmd # Rmd version of 6690_proj_algorithm.R
โฃ ๐.gitignore
โฃ ๐QSAR.Rproj
โ ๐README.md
Final report: paper
Data Set: Data Set
Result pics: pictures
-
Number of Instances: 1055
- 356 molecules are ready biodegradable (RB) and 699 are not ready biodegradable (NRB)
-
Number of Attributes: 41
- selected using many classification modeling methods combined with genetic algorithms
-
Correlationships
- KNN
- PLSDA
- SVM
- LDA
- Naive Bayes
- Decision Tree
- Bagging
- RandomForest
- Adaboost
- Neural Network
- Consensus Model
Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V. (2013). Quantitative Structure - Activity Relationship models for ready biodegradability of chemicals. Journal of Chemical Information and Modeling, 53, 867-878