This project is for the Ampep project of using Python
py cv.py [Feature] [Machine Learning Model] [Cross Validation Method] [Fold] [Trees] [Step]
py cv.py CTDD RandomForestClassifier ShuffleSplit 10 100 30
The feature method is base on the iFeature. For example in this program, we use 'CTDD' feature.
We support six training model and the default model is RandomForestClassifier.
- RandomForestClassifier
- BaggingClassifier
- ExtraTreesClassifier
- RandomTreesEmbedding
- AdaBoostClassifier
- GradientBoostingClassifier
There are four cross-validation methods and the default method is ShuffleSplit.
- ShuffleSplit
- StratifiedKFold
- StratifiedShuffleSplit
- RepeatedStratifiedKFold
This parameter is for the cross-validation fold and the default value is 10.
Amount of trees. The default trees are 100.
The number of looping for the training. Each loop will increases 100 trees. The default of step is 30.
py train.py [Feature] [Machine Learning Model] [Trees]
py train.py CTDD RandomForestClassifier 800
The feature method is base on the iFeature. For example in this program, we use 'CTDD' feature.
We support six training model and the default model is RandomForestClassifier.
- RandomForestClassifier
- BaggingClassifier
- ExtraTreesClassifier
- RandomTreesEmbedding
- AdaBoostClassifier
- GradientBoostingClassifier
Amount of trees. The default trees are 100.
py test.py [Feature] [Fasta Path] [Model Path]
py test.py CTDD './data/input.fasta' './model/RandomForestClassifier_800.pkl'
The feature method is base on the iFeature. For example in this program, we use 'CTDD' feature.
The path of the input .fasta file.
The path of the model file.