This is an example project repository to illustrate what a reproducible analysis might look like as discussed in more detail in the Reproducibility in Cancer Informatics course.
It can be used as a template or otherwise borrowed from.
This example analysis:
- Downloads data from refine.bio using the refine.bio python API client.
- Identifies the top 90th percentile variant genes from the set.
- Creates and saves a heatmap from those genes.
It also has its own Docker image and GitHub actions to aid reproducibility.
To run this analysis you will need git
and Docker
installed on your computer.
These are two platforms that are very useful for reproducibility so they will be useful for you far beyond this repository.
To re-run this analysis within its Docker image, open up your Terminal/Command Prompt.
- First you can obtain a local copy of this repository by
git clone
-ing it.
git clone https://github.com/jhudsl/reproducible-r-example.git
- Now navigate to the top of this repository.
cd reproducible-r-example
- Use the following command to run the analysis:
docker run \
--mount type=bind,target=/home/rstudio,source=$PWD \
jhudsl/reproducible-r \
bash run_analysis.sh
-
run_analysis.sh
- This main bash script runs the both steps of the analysis by calling the00
and01
scripts. -
00-download-data.py
- This script uses the python client for the refine.bio API to download the processed and normalized data that is used to make a heatmap. -
01-heatmap.Rmd
- This Rmd notebook takes the data that is downloaded by00-download-data.py
and creates a heatmap that is saved toplots/aml_heatmap.png
.
The [data used by this analysis](dataset can be downloaded from this page on refine.bio](https://www.refine.bio/experiments/SRP070849) is downloaded processed and quantile normalized from refine.bio using their API. It is RNA-seq data from 19 acute myeloid leukemia (AML) mice models.
Two directories are created by this analysis and hold the output:
plots/
- contains the heatmap png: aml_heatmap.png
results/
- contains the TSV file list of most variant genes: top_90_var_genes.tsv
Package management for this project is done with renv.
If you don't have renv, you will need to install that first with install.packages("renv")
.
Follow the workflow describe by the renv introduction, but realizing that this repository already has an renv
project initialized.
So in the Console
window:
- Use
renv::restore()
to load in the packages from the currentrenv.lock
file.
- Work in the project as normal, installing and removing new R packages as they are needed in the project,
- Call renv::snapshot() to save the state of the project library to the lockfile (called renv.lock),
- Continue working on your project, installing and updating R packages as needed.
- Call renv::snapshot() again to save the state of your project library if your attempts to update R packages were successful, or call renv::restore() to revert to the previous state as encoded in the lockfile if your attempts to update packages introduced some new problems.
Be sure to add the renv.lock
file to any commits and pull requests since that's what has stored the package changes to your environment!
With your current directory being the top of this repository, run this command in your Terminal:
docker run -it -v $PWD:/home/rstudio -e PASSWORD=password -p 8787:8787 jhudsl/reproducible-r
Then in the browser of your choice, navigate to localhost:8787
If you prefer to build the image locally, or have otherwise modified the Dockerfile and want to test if it builds, you can run this command from the top of the repository:
docker build -f docker/Dockerfile . -t jhudsl/reproducible-r
Running docker ps
should show you the jhudsl/reproducible-r
listed with your images
There are two main GitHub actions in this repository:
docker-management.yml
- Tests the building of the docker image upon changes to theDockerfile
being added to a pull request.run-py-notebook.yml
- Re-runs the analysis by runningmake_heatmap.ipynb
within the docker image (using the command described above).
Both GitHub actions have the option to be run manually.
The Docker management GitHub actions also has the option to push the re-built Docker image to Dockerhub by setting dockerhubpush
to true
.