/Cancer_non-Cancer_classification_Naive-Bayes

This project includes data which are classified as Cancer and Non cancer by experts. We are using Naive Bayes method to classify the texts on test set.

Primary LanguageJupyter Notebook

Cancer-vs-non-Cancer-text-classification

Implemented the Naive Bayes algorithm from the scratch. Download and unzip the folder given. Kindly keep all the documents within the same folder. Do not move files out of the folder.

You will be needed to provide input.

In first prompt enter the dataset you want to train. It takes “train 1” or “train 2” as the input. train 1 is small size dataset whereas train 2 is large train dataset.

In second prompt you will be asked if you want to binarize the document. It takes “yes” or “no” as an input.

In third prompt you will be asked if you want to exclude the stop words from the document. It takes “yes” or “no” as an input.

After providing the input, results will be calculated at the end of the notebook.

To train the model again with different inputs, again click on “run all cells” and provide the new inputs.