This project aims to predict the survival rate of breast cancer patients based on various features. It utilizes a CatBoost model trained on a dataset containing information about patients such as age, gender, tumor stage, protein levels and treatment history.
- model: Contains saved pickle files of the CatBoost model and the preprocessing pipeline.
- src: Includes scripts for different purposes:
- hyperparameters.py: Defines hyperparameters used in the model.
- logger.py: Logging functionalities for tracking the training process.
- preprocessing.py: Preprocessing methods and functions.
- train.py: Script for training the breast cancer survival prediction model.
- ingest_data.py: Data ingestion and processing script.
- notebook: Experimental notebooks used for analysis and development.
- run_pipeline.py: Script to initiate the training process using the source code present in the src directory.
- streamlit_app.py: Streamlit application for demonstrating the functionality of the trained model.
- Clone the repository:
git clone https://github.com/hardikjp7/Breast-Cancer-Survival-Prediction.git
- Navigate to the project directory:
cd Breast-Cancer-Survival-Prediction
- Install the required dependencies:
pip install -r requirements.txt
To train the model, use the run_pipeline.py
script:
python run_pipeline.py
To run the Streamlit app for demo purposes, use the following command:
streamlit run st.py
This will launch a local server where you can interact with the trained model through a user-friendly interface.
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to add more sections or details based on your project's specific needs and requirements.