LearningCurves: A Jupyter Notebook repository from adpartin

Learning curves.

This repo contains code to generate learning curves. The primary application is the supervised problem of predicting response of cancer cell lines to anti-cancer drugs.

There are three main scripts you need to run:

Genrate topN dataset (build_topN.py)
Genrate the data splits (gen_data_splits.py)
Use data splits from (2) to generate learning curves (main_lrn_crv.py)

Script (1) requires to have a folder called "data" that contains a set of required files. You can just copy the folder /vol/ml/apartin/projects/candle/data to your parent dir.

Step-by-step execusion

First, run the 1st script as follows:

python build_topN.py --top_n 6 --format parquet --labels

This will create dir called top6_data. The folder will contain a single parquet file. In addition, some plots are generated.

Then, run the 2nd script to generate the data splits:

python gen_data_splits.py --dirpath top6_data

This will create dir called top6_data_splits that contains splits for various k-folds. In addition data files are generated xdata.parquet and meta.parquet.

Finally, run the 3rd script to generate the learning curves:

python main_lrn_crv.py --dirpath top6_data --clr_mode trng1

adpartin/LearningCurves

Learning curves.

Step-by-step execusion