🚧 work in progress 🚧
Develop a knowledge-based approach using MSK-IMPACT data to build an automatic variant classifier.
-
analysis/
: folder to design and run analysis, contains several sub-folders:description/
: this part is written in R and contains the description/annotation/formating process of the datasetprediction/
: this part is written in Python and contains the classifier building process
-
data/
: raw data and main processed data, processed data should be reprducible from raw data⚠️ This folder should not be versionned. -
doc/
: useful documentation, bibliography, slides for talks -
temp/
: drafts, temporary files and old scripts -
utils/
: main scripts used across analysis (predictors, cross-validation scripts, evaluation scripts, tools...)
You can clone this repository using:
$ git clone https://github.com/ElsaB/impact-annotator.git
The first part of the repository was written and tested under R 3.5.1
and R 3.2.3
, working with JupyterLab.
To work with this repository please make sure to have the following R packages installed:
tidyverse
gridExtra
utf8
readxl
hexbin
# run in an R console
install.packages('tidyverse', repos = 'http://cran.us.r-project.org')
install.packages('gridExtra', repos = 'http://cran.us.r-project.org')
install.packages('utf8', repos = 'http://cran.us.r-project.org')
install.packages('readxl', repos = 'http://cran.us.r-project.org')
install.packages('hexbin', repos = 'http://cran.us.r-project.org')
The second part of the repository was written and tested under Python 3.6
, working with JupyterLab. You can see the requirements under conda-env_requirements.yml
. The main Python packages used are:
numpy
matplotlib
seaborn
pandas
scikit-learn
imbalanced-learn
ipython
nb_conda_kernels
To work with this repository please:
To create the virtualenv used by the jobs, please run the following commands on your selene cluster session:
# create the virtualenv
$ mkvirtualenv --python=python3.6 impact-annotator_env
# install useful libraries
$ pip install numpy matplotlib seaborn pandas scikit-learn imbalanced-learn ipython nb_conda_kernels
Some useful command lines:
# activate the virtualenv
$ workon impact-annotator_env
# deactivate the virtualenv
$ deactivate
# remove the virtualenv
$ rmvirtualenv impact-annotator_env
Please add the following line in your .bashrc
or .bash_profile
to use virtualenv functions directly from the notebook later:
# add in your .bashrc or .bash_profile
source `which virtualenvwrapper.sh`
To create the conda-env, please run the following command:
# create the conda-env and load the appropriate libraries
$ conda env create --name impact-annotator_env --file conda-env_requirements.yml
Some useful command lines:
# activate the conda-env
$ source activate impact-annotator_env
# deactivate the conda-env
$ source deactivate
# remove the conda-env
$ conda env remove --name impact-annotator_env
⚠️ Please always activate theimpact-annotator_env
conda-env before running any Python notebook, to make sure you have all the necessary dependecies and the good libraries version:# if you use jupyter notebook $ source activate impact-annotator_env; jupyter notebook # if you use jupyter lab $ source activate impact-annotator_env; jupyter lab
In any Python Jupyter notebook, importing the file utils/python/setup_environment.ipy
automatically check that you're running the notebook under the impact-annotator_env
conda-env, you can check it yourself by running in the notebook:
# prints the current conda-env used
!echo $CONDA_DEFAULT_ENV
# list all the conda-env on your computer, the one you're working on is indicated by a star
!conda env list
Go to the data/
folder and follow the README.md
to download all the necessary data.
- Download R packages
tidyverse
,gridExtra
,utf8
,readxl
,hexbin
- Create cluster virtualenv
impact-annotator_env
- Add
source `which virtualenvwrapper.sh`
in.bashrc
or.bash_profile
- Create local conda-env
impact-annotator_env
- Remember to always activate the conda-env
impact-annotator_env
before running a Jupyter Notebook/JupyterLab instance ($ source activate impact-annotator_env
)
All R notebooks will begin with the following lines, which load a set of custom function designed by us, and setup the R environment by loading the appropriate libraries:
source("../../../utils/R/custom_tools.R")
setup_environment("../../../utils/R")
All Python notebooks will begin with the following lines, which load a set of custom function designed by us, and load appropriate libraries, it also makes sure that you're working on the impact-annotator_env
that you should have created earlier:
%run ../../../utils/Python/setup_environment.ipy
# if you want to send jobs on the cluster from the notebook on your local computer, please also run:
%run ../../../utils/Python/Selene_Job.ipy