Disease Prediction from Symptoms

This project explores the use of machine learning algorithms to predict diseases from symptoms.

Algorithms Explored

The following algorithms have been explored in code:

Naive Bayes
Decision Tree
Random Forest
Gradient Boosting

Dataset

Source-1

The dataset for this problem used with the main.py script is downloaded from here:

https://www.kaggle.com/kaushil268/disease-prediction-using-machine-learning

This dataset has 133 total columns, 132 of them being symptoms experienced by patiend and last column in prognosis for the same.

Source-2

The dataset for this problem used with the Jupyter notebook is downloaded from here:

https://impact.dbmi.columbia.edu/~friedma/Projects/DiseaseSymptomKB/index.html

This dataset has 3 columns:

Disease  | Count of Disease Occurrence | Symptom

You can either copy paste the whole table from here to an excel sheet or scrape it out using Beautifulsoup.

Directory Structure