- Scikit-learn
- Pandas
- Numpy
- Matplotlib
- Tensorflow
- disgenet_2020.db: Database file of DisGeNET (this file should be downloaded in the root directory from the DisGeNET website)
- gene_sequences_final.{csv, fasta}: Gene sequences to be used in training downloaded from NCBI
- model.py: Contains the code for training the model using traditional ML algorithms
- model_neural.py: Contains the code for training the model using neural networks
- features/ : directory containing the feature vectors for the seuqences in gene_sequences_final.csv
- model_pca_10/ : directory containing the trained models using PCA with 10 components
- model_pca_20/ : directory containing the trained models using PCA with 20 components
- model_neural/ : directory containing the trained neural network model
- results/ : directory containing the results of the models