Using Single end data

Question

Using Single end data

benkraj opened this issue 5 years ago · 5 comments

Hi-

I have a single end RNA-seq data set that I would like to use the pipeline on. I've tried, but it seems to only complete processes 1A-1D and doesn't begin any of the others. I'm guessing this is due to only having one fastq, but I'm not 100% that's the issue.

Is there a way to specific to use single end data --- or could you point me in the right direction to update the pipeline for this purpose?

Any help would be appreciated.

Thanks,
Ben

Answer 1 · 2020-01-15T21:12:32.000Z

Sorry I'm wrong, it isn't about single end (though maybe that's also another problem).

I attempted it again with paired end reads I downloaded online and it still stops after the first 4 steps.

krajacichbj@cn0983 pipeline$ bash rna.variant.test.mosq.sh
Loading singularity 3.5.2 on cn83
N E X T F L O W ~ version 19.10.0
Pulling CRG-CNAG/CalliNGS-NF ...
downloaded from https://github.com/CRG-CNAG/CalliNGS-NF.git
Launching CRG-CNAG/CalliNGS-NF astonishing_fermat - revision: 8416386 [master]
C A L L I N G S - N F v 1.0
genome : rna.variant/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa
reads : rna.variants/test.mosq/{1,2}.fastq.gz
variants : rna.variant/ag1000g.phase2.ar1.variants.pass.X.vcf.gz
blacklist: rna.variant/Ag.sorted.bed
results : rna.variant/results/
gatk : rna.variant/gatk-3.7.0/GenomeAnalysisTK.jar
executor > slurm (4)
[ac/60f52d] process > 1A_prepare_genome_samtools (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔
[30/b914a0] process > 1B_prepare_genome_picard (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔
[eb/ef49b2] process > 1C_prepare_star_genome_index (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔
[91/7d38e3] process > 1D_prepare_vcf_file (ag1000g.phase2.ar1.variants.pass.X.vcf) [100%] 1 of 1 ✔
[- ] process > 2_rnaseq_mapping_star -
[- ] process > 3_rnaseq_gatk_splitNcigar -
[- ] process > 4_rnaseq_gatk_recalibrate -
[- ] process > 5_rnaseq_call_variants -
[- ] process > 6A_post_process_vcf -
[- ] process > 6B_prepare_vcf_for_ase -
[- ] process > 6C_ASE_knownSNPs -
Completed at: 15-Jan-2020 16:04:15
Duration : 10m 1s
CPU hours : 0.1
Succeeded : 4

I am initializing the pipeline with:
rna.variant/nextflow run CRG-CNAG/CalliNGS-NF
-c rna.variant/pipeline/CalliNGS-NF/biowulf.config2
--reads 'rna.variants/test.mosq/{1,2}.fastq.gz'
--genome rna.variant/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa
--variants rna.variant/ag1000g.phase2.ar1.variants.pass.X.vcf.gz
--blacklist rna.variant/Ag.sorted.bed
--results rna.variant/results/
--gatk rna.variant/gatk-3.7.0/GenomeAnalysisTK.jar
--max_memory '128.GB'

Any thoughts as to why it stops after the first 4 processes would be helpful. I don't get any errors that I see.

Thanks,
Ben

Answer 2 · 2020-01-17T13:12:12.000Z

I am having exactly the same problem here... the first 4 processes run but the remainder of the pipeline 'fails' ...

Answer 3 · 2020-01-17T13:25:08.000Z

Any thoughts as to why it stops after the first 4 processes would be helpful. I don't get any errors that I see.

OK I've solved it my end.. I had forgotten to add an asterisk (*) in the designated reads variable.. so nextflow is looking in that directory and not finding the files to add to the channel ...

So in your case you have
reads : rna.variants/test.mosq/{1,2}.fastq.gz

Try this instead
reads : rna.variants/test.mosq/*{1,2}.fastq.gz

Answer 4 · 2020-01-17T14:24:53.000Z

Ah yes now it does continue for my test paired end reads. Thanks a lot for catching that @maltesemike !!

So then back to the original problem, is there a way to adapt the pipeline for single-end RNA-seq? @pditommaso , do you have any suggestions?

Answer 5 · 2020-01-30T21:00:36.000Z

I'm struggling with GATK issues, but correcting my filenames (even with single-end) allowed the run to proceed. So I will close this issue.