CRG-CNAG/CalliNGS-NF

Using Single end data

benkraj opened this issue · 5 comments

Hi-

I have a single end RNA-seq data set that I would like to use the pipeline on. I've tried, but it seems to only complete processes 1A-1D and doesn't begin any of the others. I'm guessing this is due to only having one fastq, but I'm not 100% that's the issue.

Is there a way to specific to use single end data --- or could you point me in the right direction to update the pipeline for this purpose?

Any help would be appreciated.

Thanks,
Ben

Sorry I'm wrong, it isn't about single end (though maybe that's also another problem).

I attempted it again with paired end reads I downloaded online and it still stops after the first 4 steps.

krajacichbj@cn0983 pipeline$ bash rna.variant.test.mosq.sh
Loading singularity 3.5.2 on cn83
N E X T F L O W ~ version 19.10.0
Pulling CRG-CNAG/CalliNGS-NF ...
downloaded from https://github.com/CRG-CNAG/CalliNGS-NF.git
Launching CRG-CNAG/CalliNGS-NF astonishing_fermat - revision: 8416386 [master]
C A L L I N G S - N F v 1.0
genome : rna.variant/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa
reads : rna.variants/test.mosq/{1,2}.fastq.gz
variants : rna.variant/ag1000g.phase2.ar1.variants.pass.X.vcf.gz
blacklist: rna.variant/Ag.sorted.bed
results : rna.variant/results/
gatk : rna.variant/gatk-3.7.0/GenomeAnalysisTK.jar
executor > slurm (4)
[ac/60f52d] process > 1A_prepare_genome_samtools (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔
[30/b914a0] process > 1B_prepare_genome_picard (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔
[eb/ef49b2] process > 1C_prepare_star_genome_index (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔
[91/7d38e3] process > 1D_prepare_vcf_file (ag1000g.phase2.ar1.variants.pass.X.vcf) [100%] 1 of 1 ✔
[- ] process > 2_rnaseq_mapping_star -
[- ] process > 3_rnaseq_gatk_splitNcigar -
[- ] process > 4_rnaseq_gatk_recalibrate -
[- ] process > 5_rnaseq_call_variants -
[- ] process > 6A_post_process_vcf -
[- ] process > 6B_prepare_vcf_for_ase -
[- ] process > 6C_ASE_knownSNPs -
Completed at: 15-Jan-2020 16:04:15
Duration : 10m 1s
CPU hours : 0.1
Succeeded : 4

I am initializing the pipeline with:
rna.variant/nextflow run CRG-CNAG/CalliNGS-NF
-c rna.variant/pipeline/CalliNGS-NF/biowulf.config2
--reads 'rna.variants/test.mosq/{1,2}.fastq.gz'
--genome rna.variant/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa
--variants rna.variant/ag1000g.phase2.ar1.variants.pass.X.vcf.gz
--blacklist rna.variant/Ag.sorted.bed
--results rna.variant/results/
--gatk rna.variant/gatk-3.7.0/GenomeAnalysisTK.jar
--max_memory '128.GB'

Any thoughts as to why it stops after the first 4 processes would be helpful. I don't get any errors that I see.

Thanks,
Ben

I am having exactly the same problem here... the first 4 processes run but the remainder of the pipeline 'fails' ...

Any thoughts as to why it stops after the first 4 processes would be helpful. I don't get any errors that I see.

OK I've solved it my end.. I had forgotten to add an asterisk (*) in the designated reads variable.. so nextflow is looking in that directory and not finding the files to add to the channel ...

So in your case you have
reads : rna.variants/test.mosq/{1,2}.fastq.gz

Try this instead
reads : rna.variants/test.mosq/*{1,2}.fastq.gz

Ah yes now it does continue for my test paired end reads. Thanks a lot for catching that @maltesemike !!

So then back to the original problem, is there a way to adapt the pipeline for single-end RNA-seq? @pditommaso , do you have any suggestions?

I'm struggling with GATK issues, but correcting my filenames (even with single-end) allowed the run to proceed. So I will close this issue.