Snakemake Workflow for detection of lncRNA

This is a snakemake pipeline to differentiate lncRNAs from mRNAs.

The pipeline takes samples in either fasta format or fastq format as input.

The pipeline takes samples with a suffix 'r_1.fq.gz' and 'r_2.fq.gz' if the samples are paired. Or it takes samples with suffix 'fq.gz' if the samples is single-end reads. It also accepts '.fa' reads/ Regardless your samples are paired, single-ended or fasta, samples names should be samples.tsv without the suffix.

You can change the name of the input files samples.tsv by editing the config file. You will also need to set the PAIRED variable in the config file to either TRUE or FALSE.

Run the pipeline

snakemake -jn

where n is the number of cores for example for 10 cores use:

snakemake -j10

Use conda

For less froodiness, use conda:

snakemake -jn --use-conda

For example, for 10 cores use:

snakemake -j10 --use-conda

This will pull automatically the same versiosn of tools we used. Conda has to be installed in the system, in addition to snakemake.

Dry Run

For a dry run use:

snakemake -j1 -n

and to print command in dry run use:

snakemake -j1 -n -p

Use Corresponding configfile:

Just update your config file to include all your sample names, edit your interval.list file to include your intervals of interest, your path, etc for example:

snakemake -j1 --configfile config-WES.yaml

or:

snakemake -j1 configfile config-WGS.yaml

TODO

More tools will be included

References

Li, A., Zhang, J., & Zhou, Z. (2014). PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC bioinformatics, 15(1), 1-10.