/oncogene-analysis

COMS 4761 final project, spring 2023.

Primary LanguagePython

oncogene-analysis

COMS 4761 final project, spring 2023.

The data is archived in data/archives folder and we had to zip the files because they were huge. We used git-lfs for storing actual transciptome data.

Models Testing
The models are categorized into three different folder: 214, 3k and 6k genes, which refers to the number of genes we used for building the models (Originally we had 17k genes).

Each subfolder contains the csv and three models file: Logistic Regression, Decision Tree and Random Forest. Hence the python file is called breast_cancer_3k_genes_log_reg.py, breast_cancer_3k_genes_dec_tree.py breast_cancer_3k_genes_random_forest.py for Logistic Regression, Decision Tree and Random Forest respectively.

To run the python file, run this on the command line:
python breast_cancer_3k_genes_log_reg.py
If you get an error that a particular module could not be found, install the required package using:
pip install <package_name>
E.g pip install imblearn

The project uses scikit-learn, imblearn and other packages which might need installation.