/grn-nextflow

Boostdiff workflow

Primary LanguageRMIT LicenseMIT

DGRN-Benchmark

Installation Instructions

  1. Install nextflow with the following command (can be moved to any directory you want): (requires Java version 11 or higher)
curl -fsSL get.nextflow.io | bash
  1. Install singularity (https://docs.sylabs.io/guides/3.0/user-guide/installation.html) Note: Singularity images will be created when executing the pipeline. These are automatically stored in a cache in the nextflow work directory AND in your home directory. The singularity cache directory can be changed from the home directory to any directory via the environment variable SINGULARITY_CACHEDIR in your .bashrc
  2. Clone the grn-benchmark repository and navigate to it:
git clone git@github.com:bionetslab/grn-nextflow.git && cd grn-nextflow
  1. Go into nextflow.config and change the parameter singularity.runOptions to the folder that contains your nextflow work directory and the folder that will contain your results folder (user-defined at start of pipeline). If they are separate folders, use a comma separated list

Now you are set to run the benchmark!

Running the shiny application

Relevant if the files are linked as symbolic links: create a hard copy of the files. If you want to run the resulting shiny app saved in example_pipeline_output, use (change $RESULT_DIRECTORIES to all result output directories of the pipeline in output):

cp -rL example_pipeline_output output && cd output
singularity pull docker://nicolaimeyerhoefer/shiny_app
singularity exec --bind=./app:/app/,./data:/data/,./$RESULT_DIRECTORIES/:/$RESULT_DIRECTORIES/ ./shiny_app_latest.sif 'app/run_shiny.sh'

TODO: Adjust pipeline so the first copying step is not necessary! (Needed atm because pipeline only creates symlinks)

Interpretation of computed networks of Inter-Net Xplorer

See this pdf Network Explanation

Running Inter-Net Xplorer

This section describes piece by piece how to run this pipeline. Examples will be provided along the way and at the end of this section.

1) Running the nextflow pipeline:

To run the the nextflow pipeline use the following command and swap out the parameters to fit your needs. The next sections go over the paramters in detail

${path_to_nextflow}/nextflow run main.nf --tools=${tools_to_run} --mode=${data_mode} --input=${data_input} -params-file ${config_file} --publish_dir=${output_data_path}

2) Setting the --tools paramter:

The --tools parameter needs to be set to identify the tools that are used in the pipeline. Current available tools are:

The --tools parameter needs to be set as comma separated list. For example, if you want to use boostdiff and grnboost2 you need to set --tools=boostdiff,grnboost2

3) Setting the --mode parameter:

The --mode parameter needs to be set to identify the data that you are using. Currently availabe modes are seurat, tsv and anndata.

4) Setting --input parameter:

The full path has to be set for all input files! ALL values in the columns that are used for selection in the configuration file must not contain ",", "-", ":".

4.1) If --mode=seurat:

Use the --input parameter to set the path to the seurat file. The file type must be .RDS. If you are using this mode, you need to provide a configuration file with the -params-file parameter that contains information about the grouping/filtering that should be done in the Seurat object for your specific needs. See example_config.yaml for instructions and an example on how to write a config file for your dataset.

4.2) If --mode=tsv:

Do not set the --input parameter!
Use the --input_file1 parameter to set the path to the first tsv file.
Use the --input_file2 parameter to set the path to the second tsv file.

The first column of the tsv files has to be named Gene and contain all gene names. The following columns represent the samples. If you are using this mode, you need to set the --comparison_id parameter. This needs to be an identifiable string because the folder with results will be named after this. You do not need to set the -params-file parameter as there cannot be done any grouping/filtering on the tsv files!

4.3) If --mode=anndata:

Set the --input paramter to the path to the AnnData object. The file type must be .h5ad. The AnnData object will be converted to a Seurat object in the pipeline. If you are using this mode, you need to provide a configuration file with the -params-file parameter that contains information about the grouping/filtering that should be done in the AnnData object for your specific needs. See example_config.yaml for instructions and an example on how to write a config file for your dataset.

5) Setting the --publish_dir parameter:

This parameter sets the path to the results folder where the results/outputs should be written. This folder must exist!

6) Optional Parameters

  1. --create_metacells: Default value: TRUE
    Determines whether metacells should be created or not. If you do not use metacells, the computation runtime of the implemented tools is really long.
  2. --work: Default value: Path to folder where you start the the nextflow pipeline.
    Change this if you want the internal nextflow files to be stored somewhere else. The internal nextflow files can be quite big, so be careful if you have limited disk usage.
  3. --n_runs: Default value: 10
    Determines how often the tools are run that rely on randomization (boostdiff, grnboost2). This is done to improve the robustness of these tools.
  4. See nextflow.config for all tool specific and nextflow specific parameters.
  5. --use_tf_list: Default value: false
    Determines if boostdiff should use a transcription factor (tf) list to only infer edges that can be from a gene in this list to any other gene. This reduces the computation time. This only works if the underlying organism is human. If this parameter is set to false, all genes will be compared to all genes. WIP: Bugfix/extend to other organisms

Further Information

README is WIP: Information to come:

  1. Structure of the pipeline
  2. Instructions/Example on how to extend the pipeline with a tool/analyses