OncoSimul

This README contains comments specific to the frequency-dependent-fitness branch

Code for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulations so far use continuous-time models and can include driver and passenger genes and modules. Also included are functions for: simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, and additive models) and plotting them.

New functionality to allow for frequency-dependent fitness has been added.

The /OncoSimulR directory contains the code for the BioConductor package OncoSimulR. The /miscell-files directory contains additional files so far only related to the above.

A former version of this code has been used in the paper "Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling", BMC Bioinformatics, 2015, 16:41. OncoSimulR has also been used extensively in the simulations reported in the Bioinformatics paper "Cancer Progression Models And Fitness Landscapes: A Many-To-Many Relationship" and the bioRxiv preprint "Every which way? On predicting tumor evolution using cancer progression models".

You can also find OncoSimulR on the Genetic Simulation Resources catalogue. Catalogued on GSR

Installation

To use the most recent code in BioConductor, install the devel version.

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("OncoSimulR", version = "devel")

To start using it:

library(OncoSimulR)

The above, however, will not install the version with frequency dependent fitness. To use the frequency-dependent fitness version read the following.

Installing the frequency-dependent fitness branch

If you use Linux and other Unixes (Macs)

You should install from github as follows:

if (!require("devtools"))
    install.packages("devtools") ## if you don't have it already
library(devtools)
install_github("rdiaz02/OncoSimul/OncoSimulR", ref = "freq-dep-fitness")

If you use Windows

We have not uploaded the changes in the freq-dep-fitness branch to BioConductor because there are known problems compiling ExprTk with MinGW (e.g., https://sourceforge.net/p/mingw-w64/discussion/723797/thread/c6b70624/).

Things work with other toolchains and, eventually, Rtools 40 should become the default toolchain, and the problem will get solved. In the meantime, you have two options:

Install from a zip file:

(First, install the current OncoSimulR from BioConductor, to resolve all the dependencies; see above or go to http://www.bioconductor.org/packages/devel/bioc/html/OncoSimulR.html).

Download the OncoSimulR_2.17.xyz.zip file we provide; for example, from your web browser, place the mouse on the link, and right click to select "Save link as", or similar incantation.

Now install that zip file from R (e.g., from the menu, go to "Packages", "Install package(s) from local file(s)", and select the file). This works with R-3.6.0 (and R-3.6.0, patched and R-devel, future R-3.7.0)

Install using Rtools40

(This is more work and takes more much more time)

We have verified that OncoSimulR (at least as of 2019-05-24) does install with Rtools40.

How to do it:

  1. Install Rtools40 and its associated R-testing as explained in https://cran.r-project.org/bin/windows/testing/rtools40.html.
  2. Install igraph following these notes: https://github.com/r-windows/checks/issues/2
  3. Now, install OncoSimulR from BioConductor (to resolve all dependencies in one go): https://www.bioconductor.org/packages/devel/bioc/html/OncoSimulR.html This will take more than one hour.
  4. Clone the git repo and move to that directory.
  5. Go to a MINGW shell console, and install. For example, if you have installed R-testing under 'C:\R', you can do
/c/R/R-testing/bin/x64/R CMD INSTALL --no-multiarch OncoSimulR

Alternatively, install from a local file, but you need to specify the tar.gz (the zip file will not work, of course, since the R-testing that ships for/with Rtools40 will not install from zip files)

Installing from source takes a while (more than 5 minutes).

BioConductor github repository

The github repository for this package is this one: https://github.com/rdiaz02/OncoSimul . Since mid-2017 BioConductor is maintained using git, but since this directory contains other files and directories in addition to the OncoSimulR package itself, I have not used option "Sync an existing GitHub repository with Bioconductor". Instead, I continue using this github repo, but then locally update a Bioconductor-only repository of just the OncoSimulR code (as explained in Maintain a Bioconductor-only repository for an existing package).

Documentation

As any R/BioConductor package, OncoSimulR comes with documentation for its user-visible functions and data sets (using the help is just standard R usage). From OncoSimulR's BioConductor page you have access to the standard documentation both the manual and overview ---the vignette. The best place to start is the vignette (created from the OncoSimulR/vignettes/OncoSimulR.Rnw file that includes both text and code).

You can view the vignette from R itself doing

browseVignettes("OncoSimulR")

and this gives you access to the HTML, the Rmd file (markdown + R), and the R code.

Documentation: HTML and PDF for this repo's version

From these two links you can also browse the HTML vignette and get a PDF version.

These files correspond to the most recent, github version, of the package (i.e., they might include changes not yet available from the BioConudctor package).

Further documentation

This paper published in Bioinformatics gives a quick overview of OncoSimulR (a former version is available as a bioRxiv preprint). You can also take a look at this poster presented at ECCB 2016.

If you use the package in publications please cite the Bioinformatics paper.

The frequency-dependent fitness functionality is based on Sergio Sanchez-Carrilo's Master's thesis (see also file 'miscell-files/Sergio_Sanchez-Carillo-improvements-post-TFM.pdf' for additional features that were not described in the original thesis).

Licenses and copyright

The R/BioConductor OncoSimulR package is licensed under the GPLv3 license. The code for the OncoSimulR BioConductor package, except for functions plot.stream and plot.stacked, is Copyright 2012-2019 by Ramon Diaz-Uriarte; the code for frequency dependent fitness is Copyright 2017-2019 Sergio Sanchez-Carrillo. plot.stream and plot.stacked are Copyright 2013-2016 by Marc Taylor (see also https://github.com/marchtaylor/sinkr and http://menugget.blogspot.com.es/2013/12/data-mountains-and-streams-stacked-area.html).

The code under src/FitnessLandascape is from MAGELLAN, Maps of Genetical Landscapes. The authors are S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and L. Ferreti. I downloaded the sources on 2019-06-05 from http://wwwabi.snv.jussieu.fr/public/magellan/latest.tgz. The code is under the GPLv3. MAGELLAN is "an integrated tool to visualize and analyze fitness landscapes of small dimension (up to 7-8 loci)". In OncoSimulR we use only a very limited subset of the functionality of MAGELLAN (mostly to generate different types of random fitness landscapes and to compute statistics of epistasis); the Makevars file we use only compiles two of the executables (fl_statistics and fl_generate) ---the directory src/FitnessLandascape contains, however, the complete sources. Note also that the plots of fitness landscapes used in OncoSimulR are actually blatantly copied in looks from MAGELLAN's plots.

The code under OncoSimulR/src/exprtk.h is from The C++ Mathematical Expression Toolkit Library (ExprTk). This code is copyright Arash Partow, and is licensed under "The MIT License (MIT)" (http://www.opensource.org/licenses/MIT) and is compatible with GPL (http://directory.fsf.org/wiki/License:Expat). The file was originally downloaded from http://www.partow.net/programming/exprtk/index.html on 2017-05-15. The most recent version was downloaded again in 2019-05-14 (and corresponds to the exprTk repo at commit https://github.com/ArashPartow/exprtk/commit/9fad72832c70348725c073e369a3321781001766). The file was originally named exprtk.hpp; to conform to R's requirements, it was renamed as exprt.h

The code in miscell-files/randutils.h is copyright Melissa E. O'Neill, and is licensed under "The MIT License (MIT)" in the terms explained in the file itself. This is a license that is compatible with the GPL. The file randutils.hpp was downloaded from https://gist.github.com/imneme/540829265469e673d045 on 2015-06-20 and is also referenced from the main article [Ease of Use without Loss of Power] (http://www.pcg-random.org/posts/ease-of-use-without-loss-of-power.html). I renamed it to randutils.h to conform to R's requirements (and changed the auto exit_func = hash(&_Exit); line to keep R from complaining about the Exit function). I had to disable usage of randutils for now, since I could not get it to compile with gcc-4.6 (since version 3.3 of R, the official Rtools for Windows now support C++-11, so I might change this in the near future).

The file under gitinfo-hooks is Copyright 2011 Brent Longborough, is part of the gitinfo package, and is under the LaTeX Project Public License 1.3, which is incompatible with the GPL. Note this file is not part of the OncoSimulR BioConductor package.

The files under miscell-files/AParramon_discrete_time are copyright Alberto Parramon, unless otherwise specified. This is an implementation of a discrete-time version of OncoSimulR.

Software status

Bioconductor (multiple platforms) Travis CI (Linux) Appveyor (Windows)
R CMD check Build status (release)
Build status (devel)
Build status Build status
Test coverage Coverage Status

(Note: Appveyor can fail for reasons that have nothing to do with the package, such as R not being downloaded correctly, etc. Look at the details of each failure. Similarly, some of the errors in BioConductor, specially in the development branch, can be caused, specially in Windows, by some required packages not being yet available, often "car" and _"igraph".

Again, look at the details of each failure.)