This project is a Disease Prediction System that uses machine learning models to predict diseases based on a set of symptoms. It includes data preprocessing, model training, and a symptom-based prediction function. The project uses three different machine learning models: Support Vector Classifier (SVC), Naive Bayes, and Random Forest, and combines their predictions to improve accuracy.
- Clone the repository to your local machine:
- Navigate to the project directory:
- cd disease-prediction
- Install the required dependencies
-
Prepare your training dataset in CSV format and save it as
Training.csv
. -
Prepare your test dataset in CSV format and save it as
Testing.csv
. -
Run the main script to perform the following tasks:
- Read and preprocess the training dataset.
- Train machine learning models (SVC, Naive Bayes, Random Forest) and evaluate their performance.
- Combine model predictions to make predictions on the test dataset.
- Save trained models to pickle files for later use.
- To make a disease prediction based on symptoms, use the predictDisease function in the script. Pass a list of symptoms as input, and it will return the predicted disease.
- symptoms = "Symptom1,Symptom2,Symptom3" predictions = predictDisease(symptoms) print(predictions)
- The training data is provided in the Training.csv file. It should contain columns for symptoms and the target variable (prognosis).
- The test data is provided in the Testing.csv file. It should have the same format as the training data.
- The project uses three machine learning models: Support Vector Classifier (SVC), Naive Bayes, and Random Forest.
- Model performance is evaluated using cross-validation and accuracy metrics.
- The predictDisease function takes a list of symptoms as input and predicts the disease using the trained models. It returns a dictionary with predictions from each model and the final combined prediction.
- 'main.py': The main script that performs data preprocessing, model training, and prediction.
- Training.csv: The training dataset.
- Testing.csv: The test dataset.
- svm.pkl: A serialized pickle file containing the trained SVM model.
- rf.pkl: A serialized pickle file containing the trained Random Forest model.
Contributions are welcome! If you'd like to contribute to this project, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and submit a pull request.