This project explores the use of machine learning algorithms to predict diseases from symptoms.
The following algorithms have been explored in code:
- Naive Bayes
- Decision Tree
- Random Forest
- Gradient Boosting
The dataset for this problem used with the main.py
script is downloaded from here:
https://www.kaggle.com/kaushil268/disease-prediction-using-machine-learning
This dataset has 133 total columns, 132 of them being symptoms experienced by patiend and last column in prognosis for the same.
The dataset for this problem used with the Jupyter notebook is downloaded from here:
https://impact.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html
This dataset has 3 columns:
Disease | Count of Disease Occurrence | Symptom
You can either copy paste the whole table from here to an excel sheet or scrape it out using Beautifulsoup.
|_ dataset/
|_ training_data.csv
|_ test_data.csv
|_ saved_model/
|_ [ pre-trained models ]
|_ main.py [ code for laoding kaggle dataset, training & saving the model]
|_ notebook/
|_ dataset/
|_ raw_data.xlsx [Columbia dataset for notebook]
|_ Disease-Prediction-from-Symptoms-checkpoint.ipynb [ IPython Notebook for loading Columbia dataset, training model and Inference ]