This is a tool designed to be run on a linux system since it uses many standard GNU tools. It might also work on Mac OS X system.
Provides a workflow for detection of circRNA in nanopore data.
For conveince clone the repository to the home directory:
cd ~
git clone https://github.com/omiics-dk/long_read_circRNA.git
Alternatively you can clone the repository somewhere else and make a symbolic link to the repository.
ln -s path/to/long_read_circRNA ~/long_read_circRNA
If you have conda you can setup an environment with all of the required dependencies.
conda env create -f long_read_circRNA.yml
Or you can build an environment by your self.
conda create -n long_read_circRNA -c bioconda bedtools=2.29.2 samtools pblat nanofilt deeptools
If perl is not installed on your system you can include perl in the installation:
conda create -n long_read_circRNA -c bioconda -c conda-forge perl bedtools=2.29.2 samtools pblat nanofilt deeptools
To use the environment you need to activate it afterwards.
conda activate long_read_circRNA
To make the long_read_circRNA script available in the commandline everywhere in the system you can place the file somewhere accessible in the PATH, such as /usr/local/bin or ~/bin
Working with the tools is done through using the long_read_circRNA
python CLI tool. The tool includes three different subcommands can be called:
- run
- download-data
- check-installation
To view the help message you can run the following command ./long_read_circRNA -h
.
_ ___ _ _ ___ _ _ _
| | ___ _ _ __ _ | _ \___ __ _ __| | __(_)_ _ __| _ \ \| | /_\
| |__/ _ \ ' \/ _` | | / -_) _` / _` | / _| | '_/ _| / .` |/ _ \
|____\___/_||_\__, | |_|_\___\__,_\__,_| \__|_|_| \__|_|_\_|\_/_/ \_\
|___/
Version: v2.1
usage: long_read_circRNA <command> [args]
Available subcommands:
run Main command for finding circRNAs in long read nanopore data
download-data Download any required data for references and prepare
check-installation Check if all of the required software is available
Description: CircRNA detection in nanopore data
positional arguments:
command Specify which subcommand should be used
optional arguments:
-h, --help show this help message and exit
Before running the script, it is a good idea to use the check-installation command to see if the required programs are available.
./long_read_circRNA check-installation -h
_ ___ _ _ ___ _ _ _
| | ___ _ _ __ _ | _ \___ __ _ __| | __(_)_ _ __| _ \ \| | /_\
| |__/ _ \ ' \/ _` | | / -_) _` / _` | / _| | '_/ _| / .` |/ _ \
|____\___/_||_\__, | |_|_\___\__,_\__,_| \__|_|_| \__|_|_\_|\_/_/ \_\
|___/
Version: v2.1
usage: long_read_circRNA check-installation [args]
Check if environment for running the script is properly setup by checking if
all of the required software is available
optional arguments:
-h, --help show this help message and exit
When running ./long_read_circRNA check-installation
you will get a summary of all the tools that are check for.
If there is no message for an individual tool then it has been found and there where no problems with running it.
When the installation is properly installed and you run the check-installation subcommand you will find that returns that all of the expected software requirments are present.
_ ___ _ _ ___ _ _ _
| | ___ _ _ __ _ | _ \___ __ _ __| | __(_)_ _ __| _ \ \| | /_\
| |__/ _ \ ' \/ _` | | / -_) _` / _` | / _| | '_/ _| / .` |/ _ \
|____\___/_||_\__, | |_|_\___\__,_\__,_| \__|_|_| \__|_|_\_|\_/_/ \_\
|___/
Version: v2.1
Checking for bedtools
Checking for NanoFilt
Checking for pblat
Checking for perl
Checking for samtools
All of the expected software requirements are present!
Running the ./long_read_circRNA check-installation
when some of the required tools are missing then the script will return and error and list which tools are missing. If the program returns an error when called then will return a warning that there might be an issue with the program that should be checked.
_ ___ _ _ ___ _ _ _
| | ___ _ _ __ _ | _ \___ __ _ __| | __(_)_ _ __| _ \ \| | /_\
| |__/ _ \ ' \/ _` | | / -_) _` / _` | / _| | '_/ _| / .` |/ _ \
|____\___/_||_\__, | |_|_\___\__,_\__,_| \__|_|_| \__|_|_\_|\_/_/ \_\
|___/
Version: v2.1
Checking for bedtools
Unable to find bedtools!
Checking for NanoFilt
Unable to find NanoFilt!
Checking for pblat
Unable to find pblat!
Checking for perl
Checking for samtools
Unable to find samtools!
ERROR: Some of the requirement software is either missing or has potential problems!
For conveince using the download-data subcommand provides with necessary reference files to run the analysis on human and on mice. It also provides a small test dataset to try out and run the software with.
Running ./long_read_circRNA download-data -h
shows different arguments that can be provided.
_ ___ _ _ ___ _ _ _
| | ___ _ _ __ _ | _ \___ __ _ __| | __(_)_ _ __| _ \ \| | /_\
| |__/ _ \ ' \/ _` | | / -_) _` / _` | / _| | '_/ _| / .` |/ _ \
|____\___/_||_\__, | |_|_\___\__,_\__,_| \__|_|_| \__|_|_\_|\_/_/ \_\
|___/
Version: v2.1
usage: long_read_circRNA download-data --data-output ./data --test-output ./test_fastq [args]
Download some test data and required reference data for human and mouse
optional arguments:
-h, --help show this help message and exit
--data-output DATA_OUTPUT
Where the reference data should be stored
--test-output TEST_OUTPUT
Location for the test data
--skip-human Skip downloading the human reference files
--skip-mouse Skip downloading the mouse reference files
--skip-testdata Skip downloading the test data
--installation-path INSTALLATION_PATH
Provide a path for where the script is installed.
Default is ~/long_read_circRNA
By default the tool will automatically download references to ./data
and the test data to ./test_fastq
. These can be changed by using --data-output
and --test-output
flags.
Different datasets can be skipped if they should not be downloaded, by using the skip flags.
For running the software you use the run subcommand, and provide a sample .fq.gz
file. There are serval options for changing the behavoiur of the run command such as:
- Specifying which species you are using, currently there are only options for human and mouse. Default is human.
- Defining a path to where the reference files are location. Default is ./data.
- It should be noted that it automatically searchs for ./data/human if you choose human as the species, and therefore you should only provide the ./data directory
- Providing the path to where the scripts are located. Defaults to the home directory
~/long_read_circRNA/scripts
- Providing where the outputs will be written. Default is the current directory it is running it, where it will automatically create directory with the sample name
_ ___ _ _ ___ _ _ _
| | ___ _ _ __ _ | _ \___ __ _ __| | __(_)_ _ __| _ \ \| | /_\
| |__/ _ \ ' \/ _` | | / -_) _` / _` | / _| | '_/ _| / .` |/ _ \
|____\___/_||_\__, | |_|_\___\__,_\__,_| \__|_|_| \__|_|_\_|\_/_/ \_\
|___/
Version: v2.1
usage: long_read_circRNA [-h] [--reference-path REFERENCE_PATH]
[--species {human,mouse}] [--script-path SCRIPT_PATH]
[--output-path OUTPUT_PATH] [--dry-run]
sample
Run the program for detecting circRNAs in nanopore data
positional arguments:
sample Provide a sample input .fq.gz file that should be
processed by the tool
optional arguments:
-h, --help show this help message and exit
--reference-path REFERENCE_PATH
Provide a path for where the reference data is
located. Default is './data'.
--species {human,mouse}
Select which species the sample that is from, and
specify which species reference files should be used.
Default is set to human.
--script-path SCRIPT_PATH
Specify where the long_read_circRNA scripts are
located. By default it assumes that they are located
at '~/long_read_circRNA/scripts'.
--output-path OUTPUT_PATH, -o OUTPUT_PATH
Provide a path for where the output should be saved.
Default is the current directory. This will use the
output-path and create a directory in that path based
on the sample name provided.
--dry-run Perform all of the input checks without starting the
detection scripts
The script has numerous checks and error messages if it finds any errors in the provided inputs so the scripts do not start without everything being in place.
To start the circRNA quantification on a sample you can simply write:
./long_read_circRNA run path_to_sample/sample_name.fq.gz
If you need to change any of the settings in the run then you can use the flags presented above.
When the repository is installed or available from the home directory, and the long_read_circRNA script is avaible in the PATH then a simple workflow for starting sample is:
cd path_to_where_analysis_should_be_run
long_read_circRNA download-data
long_read_circRNA run sample_name.fq.gz
To run multiple samples you can easily use a for loop to run the script on multiple samples:
cd path_to_where_analysis_should_be_run
long_read_circRNA download-data
samples = "sample_data/sample1.fq.gz sample_data/sample2.fq.gz sample_data/sample3.fq.gz"
for sample in $samples
do
long_read_circRNA run sample_name.fq.gz
done
The scripts output a number of files, the primary one being:
[sample].circRNA_candidates.annotated.txt
This file shows all circRNAs detected in the nanopore data, listed with the highest expressed at the top.