/FeatureSelection-FSRV

Novel Decomposing Model with Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs

Primary LanguagePython

A Novel Decomposing Model with Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs

Authors

Publication

If you use this code in a scientific publication, we would appreciate citations to the following paper:

R. P. Bonidia et al., "A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs," in IEEE Access, vol. 8, pp. 181683-181697, 2020, doi: 10.1109/ACCESS.2020.3028039.

@ARTICLE{9210051,
  author={R. P. {Bonidia} and J. S. {Machida} and T. C. {Negri} and W. A. L. {Alves} and A. Y. {Kashiwabara} and D. S. {Domingues} and A. {De Carvalho} and A. R. {Paschoal} and D. S. {Sanches}},
  journal={IEEE Access}, 
  title={A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs}, 
  year={2020},
  volume={8},
  number={},
  pages={181683-181697},}

List of files

  • Datasets: Datasets;

  • GA-CFS-ACC Decomposing Model with Genetic Algorithm (Fitness = CFS and ACC) - Python;

  • GA-CFS Decomposing Model with Genetic Algorithm (Fitness = CFS (Filter Approach - Main)) - Python;

  • GA-wrapper Decomposing Model with Genetic Algorithm (Wrapper approach) - Python;

  • PSO-wrapper Decomposing Model with Particle Swarm Optimization (Wrapper approach) - Python;

  • README: Documentation;

  • Requirements: List of items to be installed using pip install.

  • split_train_test Split dataset into training and testing - Python;

Dependencies

  • Python (>=3.7.4)
  • NumPy
  • Pandas
  • Scikit-learn
  • Skfeature-chappers

Installing our tool

$ git clone https://github.com/Bonidia/FeatureSelection-FSRV.git FeatureSelection-FSRV

$ cd FeatureSelection-FSRV

$ pip3 install -r requirements.txt

Usage and Examples

Split dataset into training and testing

Firstly, it is necessary to separate the dataset in training and testing. We will only use the training set for feature selection. The test set will be used to generate a final report with the efficiency of the best feature subset.

Access folder: $ cd FeatureSelection-FSRV
 
To run (Example): $ python3.7 split_train_test.py -i input -r test_rate

Where:

-i - input - csv format file, e.g., dataset.csv

-r - TEST_RATE - e.g., 0.2, 0.3

This example will generate a training and test file.

Note: Input samples for feature selection must be in csv format.

Dataset: It is important that the csv file contains the following format: feat1, feat2, ..., featk, label

The label/class must be the last column.

Running

python3.7 split_train_test.py -i lncRNA.csv -r 0.2

GA-CFS: Decomposing Model with Genetic Algorithm (Fitness = CFS (Filter Approach - Main))

Access folder: $ cd FeatureSelection-FSRV
 
To run (Example): $ python3.7 GA-CFS.py -train training.csv -test testing.csv -classifier classifier

Where:

-train - csv format file (training set), e.g., train.csv

-test - csv format file (testing set), e.g., test.csv

-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN, 
                    4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP

This example will generate a csv file with the selected features.

Note 1: Input samples for feature selection must be in csv format.

Note 2: In this algorithm, the classifier will be used to generate the final report.

Note 3: We will only use the training set for feature selection.

Note 4: The test set will be used to generate a final report with the efficiency of the best feature subset.

Running

python3.7 GA-CFS.py -train training.csv -test testing.csv -classifier 0

GA-CFS-ACC: Decomposing Model with Genetic Algorithm (Fitness = CFS and ACC - Hybrid)

Access folder: $ cd FeatureSelection-FSRV
 
To run (Example): $ python3.7 GA-CFS-ACC.py -train training.csv -test testing.csv -classifier classifier

Where:

-train - csv format file (training set), e.g., train.csv

-test - csv format file (testing set), e.g., test.csv

-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN, 
                    4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP

This example will generate a csv file with the selected features.

Note 1: Input samples for feature selection must be in csv format.

Note 2: We will only use the training set for feature selection.

Note 3: The test set will be used to generate a final report with the efficiency of the best feature subset.

Running

python3.7 GA-CFS-ACC.py -train training.csv -test testing.csv -classifier 3

GA-wrapper: Decomposing Model with Genetic Algorithm (Wrapper approach)

Access folder: $ cd FeatureSelection-FSRV
 
To run (Example): $ python3.7 GA-wrapper.py -train training.csv -test testing.csv -classifier classifier

Where:

-train - csv format file (training set), e.g., train.csv

-test - csv format file (testing set), e.g., test.csv

-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN, 
                    4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP

This example will generate a csv file with the selected features.

Note 1: Input samples for feature selection must be in csv format.

Note 2: We will only use the training set for feature selection.

Note 3: The test set will be used to generate a final report with the efficiency of the best feature subset.

Running

python3.7 GA-wrapper.py -train training.csv -test testing.csv -classifier 2

PSO-wrapper: Decomposing Model with Particle Swarm Optimization (Wrapper approach)

Access folder: $ cd FeatureSelection-FSRV
 
To run (Example): $ python3.7 PSO-wrapper.py -train training.csv -test testing.csv -classifier classifier

Where:

-train - csv format file (training set), e.g., train.csv

-test - csv format file (testing set), e.g., test.csv

-classifier - e.g., 0 = RandomForestClassifier, 1 = DecisionTreeClassifier, 2 = SVM, 3 = KNN, 
                    4 = GaussianNB, 5 = GradientBoosting, 6 = Bagging, 7 = AdaBoost, 8 = MLP

This example will generate a csv file with the selected features.

Note 1: Input samples for feature selection must be in csv format.

Note 2: We will only use the training set for feature selection.

Note 3: The test set will be used to generate a final report with the efficiency of the best feature subset.

Running

python3.7 PSO-wrapper.py -train training.csv -test testing.csv -classifier 2

About

If you use this code in a scientific publication, we would appreciate citations to the following paper:

R. P. Bonidia et al., "A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs," in IEEE Access, vol. 8, pp. 181683-181697, 2020, doi: 10.1109/ACCESS.2020.3028039.

@ARTICLE{9210051,
  author={R. P. {Bonidia} and J. S. {Machida} and T. C. {Negri} and W. A. L. {Alves} and A. Y. {Kashiwabara} and D. S. {Domingues} and A. {De Carvalho} and A. R. {Paschoal} and D. S. {Sanches}},
  journal={IEEE Access}, 
  title={A Novel Decomposing Model With Evolutionary Algorithms for Feature Selection in Long Non-Coding RNAs}, 
  year={2020},
  volume={8},
  number={},
  pages={181683-181697},}