Oshlack/JAFFA

Help with using bioconda version of jaffa

Dwooi7417 opened this issue · 24 comments

Hi

So far I have had no success installing JAFFA directly based on the instructions since I am on a shared server and do not have root access or privileges. So I tried installing it through anaconda but after the installation the files appear to be named different than if you were to download the package from the github wiki.

So to take a wild guess I thought the jaffa-assembly script would do what I wanted. Turns out the main problem I having here is that it can't seem to find the reference annotation files. Now since this is anaconda I am not sure which directory to download these files into. Do you think you can guide me on usage of jaffa-assembly?

here is the error message I get:
echo "Running JAFFA version 2.1" ; echo "Checking for required data files..." ; for i in null/hg38_genCode22.fa null/hg38_genCode22.tab /scratch_local/708254.1.short.q/tmp.pLyPvY

R68t/known_fusions.txt null/hg38.fa null/Masked_hg38.1.bt2 null/hg38_genCode22.1.bt2 ; do ls $i 2>/dev/null || { echo "CAN'T FIND $i..." ; echo "PLEASE DOWNLOAD and/or FIX

PATH... STOPPING NOW" ; exit 1 ; } ; done ; echo "All looking good" ; echo "running JAFFA version 2.1.. checks passed" > checks

echo "Running JAFFA version 2.1" ; echo "Checking for required data files..." ; for i in null/hg38_genCode22.fa null/hg38_genCode22.tab /scratch_local/708254.1.short.q/tmp.pLyPvY

R68t/known_fusions.txt null/hg38.fa null/Masked_hg38.1.bt2 null/hg38_genCode22.1.bt2 ; do ls $i 2>/dev/null || { echo "CAN'T FIND $i..." ; echo "PLEASE DOWNLOAD and/or FIX

PATH... STOPPING NOW" ; exit 1 ; } ; done ; echo "All looking good" ; echo "running JAFFA version 2.1.. checks passed" > checks

Hi,
You can set the directory of where the reference files are with:
export JAFFA_REF_BASE=
prior to running jaffa.
However, the bioconda recipe hasn't been updated for the dependencies in version 2.1, so I don't think the job will run correctly anyway. I haven't been responsible for the bioconda recipe, but I'll look into how this can be updated. You could try running version 1.09 which seems to be available as well in bioconda, or I'm happy to advise on getting it installed outside bioconda. You shouldn't need root access as long as R is installed on your server.

If you let me know how long your reads are, I'm also happy to advise on which "mode" of jaffa to use. Assembly is usually not the best option for most data.

Hope some of this is helpful.

Cheers,
Nadia.

Thank you Nadia

I'll let you know how your suggestions go. If you don't mind what mode of jaffa should I be using if I am dealing with RNA-seq from FFPE? the reads are normally 125 bp long but after trimming they might average around 100 with some being less.

I think the direct pipeline should work best for your data. You could also try it without trimming as the pipeline is pretty robust to bad sequence at the ends.
Let us know how you go.

Cheers,
Nadia.

Hi Nadia

I am trying to install JAFFA directly but its having trouble installing the following files:

WARNING: extract_seq_from_fasta could not be found!!!! You will need to download and install extract_seq_from_fasta manually, then add its path to tools.groovy

WARNING: minimap2 could not be found!!!! You will need to download and install minimap2 manually, then add its path to tools.groovy

Actually I just the program anyway despite the warnings and it looks like there is an output but I can't see the csv file. I only see bam files, fasta files and paf files. While this does show what reads are discordant pairs it is difficult for me to see what fusions are found from running the program. I was wondering am I supposed to get a summary csv file?

These were the error messages:

=========================== Stage align_reads_to_annotation (CC65CANXX) ============================
Warning: [blastn] Query_21329 HWI-D00119:24.. : Could not calculate ungapped Karlin-Altschul parameters due to an invalid query sequence or its translation. Please verify the query sequence(s) and/or filtering options

real 17m52.671s
user 28m16.617s
sys 2m28.684s

=============================== Stage filter_transcripts (CC65CANXX) ===============================
Done reading in transcript IDs
Reading the input alignment file, CC65CANXX/CC65CANXX.paf
0
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
bash: line 1: 48280 Aborted (core dumped) /home/danwoo/bin/JAFFA-version-2.1/tools/bin/process_transcriptome_align_table CC65CANXX/CC65CANXX.paf 1000 /home/danwoo/bin/JAFFA-version-2.1/hg38_genCode22.tab > CC65CANXX/CC65CANXX.txt
Cleaned up file CC65CANXX/CC65CANXX.txt to .bpipe/trash/CC65CANXX.txt
ERROR: stage filter_transcripts failed: Command in stage filter_transcripts failed with exit status = 134 :

/home/danwoo/bin/JAFFA-version-2.1/tools/bin/process_transcriptome_align_table CC65CANXX/CC65CANXX.paf 1000 /home/danwoo/bin/JAFFA-version-2.1/hg38_genCode22.tab > CC65CANXX/CC65CANXX.txt

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

-------------------------------- filter_transcripts ( CC65CANXX ) --------------------------------

Command in stage filter_transcripts failed with exit status = 134 :

/home/danwoo/bin/JAFFA-version-2.1/tools/bin/process_transcriptome_align_table CC65CANXX/CC65CANXX.paf 1000 /home/danwoo/bin/JAFFA-version-2.1/hg38_genCode22.tab > CC65CANXX/CC65CANXX.txt

Hi,
I haven't seen an error like this before, but it looks like blast had an issue with one of the reads, and then the rest of the pipeline hasn't dealt properly with the output. Are you able to send me the result of:
grep -C4 "21329 HWI-D00119:24" CC65CANXX/CC65CANXX.fasta
Hopefully this will print the read which is causing a problem and then I can make a reproducible example to see what's happened downstream.

Cheers,
Nadia.

Hi Nadia

Unfortunately I'm not getting any hits with the command you sent.

Regards,
Danson

Oh perhaps it's
grep -C4 "HWI-D00119:24" CC65CANXX/CC65CANXX.fasta
But I have a feeling blast might have cut-off the full read ID.

Can you also paste the result of
head CC65CANXX/CC65CANXX.paf
and
tail CC65CANXX/CC65CANXX.paf

Cheers,
Nadia.

head CC65CANXX/CC65CANXX.paf

HWI-D00119:248:CC65CANXX:3:1101:10000:74092/1 125 24 113 minus hg38_wgEncodeGencodeCompV22_ENST00000469930.1__range=chr7:140834061-140924709__5'pad=0__3'pad=0__strand=-__repeatMasking=none 1058 1058 969 90 90 167
HWI-D00119:248:CC65CANXX:3:1101:10000:74092/2 125 1 90 plus hg38_wgEncodeGencodeCompV22_ENST00000469930.1__range=chr7:140834061-140924709__5'pad=0__3'pad=0__strand=-__repeatMasking=none 1058 969 1058 90 90 167
HWI-D00119:248:CC65CANXX:3:1101:10001:66444/1 125 1 96 minus hg38_wgEncodeGencodeCompV22_ENST00000397579.5__range=chr9:14087603-14314582__5'pad=0__3'pad=0__strand=-__repeatMasking=none 3129 200 105 96 96 178
HWI-D00119:248:CC65CANXX:3:1101:10001:66444/1 125 1 96 minus hg38_wgEncodeGencodeCompV22_ENST00000397581.5__range=chr9:14087600-14314519__5'pad=0__3'pad=0__strand=-__repeatMasking=none 3318 137 42 96 96 178
HWI-D00119:248:CC65CANXX:3:1101:10001:66444/2 125 1 97 plus hg38_wgEncodeGencodeCompV22_ENST00000397579.5__range=chr9:14087603-14314582__5'pad=0__3'pad=0__strand=-__repeatMasking=none 3129 105 201 97 97 180
HWI-D00119:248:CC65CANXX:3:1101:10001:66444/2 125 1 97 plus hg38_wgEncodeGencodeCompV22_ENST00000397581.5__range=chr9:14087600-14314519__5'pad=0__3'pad=0__strand=-__repeatMasking=none 3318 42 138 97 97 180
HWI-D00119:248:CC65CANXX:3:1101:10001:94578/1 125 1 46 minus hg38_wgEncodeGencodeCompV22_ENST00000490044.4__range=chrX:71291995-71301168__5'pad=0__3'pad=0__strand=+__repeatMasking=none 3215 2434 2389 46 46 86.1
HWI-D00119:248:CC65CANXX:3:1101:10001:94578/1 125 1 46 minus hg38_wgEncodeGencodeCompV22_ENST00000373841.4__range=chrX:71283633-71301168__5'pad=0__3'pad=0__strand=+__repeatMasking=none 2606 1825 1780 46 46 86.1
HWI-D00119:248:CC65CANXX:3:1101:10001:94578/1 125 1 46 minus hg38_wgEncodeGencodeCompV22_ENST00000276079.11__range=chrX:71283583-71301168__5'pad=0__3'pad=0__strand=+__repeatMasking=none 2713 1932 1887 46 46 86.1
HWI-D00119:248:CC65CANXX:3:1101:10001:94578/1 125 1 46 minus hg38_wgEncodeGencodeCompV22_ENST00000535149.4__range=chrX:71283192-71301166__5'pad=0__3'pad=0__strand=+__repeatMasking=none 2882 2103 2058 46 46 86.1

tail CC65CANXX/CC65CANXX.paf

HWI-D00119:248:CC65CANXX:3:2316:9999:41567/1 125 11 89 minus hg38_wgEncodeGencodeCompV22_ENST00000494445.1__range=chr3:186787314-186789291__5'pad=0__3'pad=0__strand=+__repeatMasking=none 629 415 337 79 79 147
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/1 125 11 89 minus hg38_wgEncodeGencodeCompV22_ENST00000425053.4__range=chr3:186783577-186789881__5'pad=0__3'pad=0__strand=+__repeatMasking=none 1977 1177 1099 79 79 147
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/1 125 11 78 minus hg38_wgEncodeGencodeCompV22_ENST00000497177.1__range=chr3:186786640-186788507__5'pad=0__3'pad=0__strand=+__repeatMasking=none 849 715 648 68 68 126
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/1 125 11 78 minus hg38_wgEncodeGencodeCompV22_ENST00000485101.4__range=chr3:186783578-186789894__5'pad=0__3'pad=0__strand=+__repeatMasking=none 5327 3806 3739 68 68 126
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/1 125 16 78 minus hg38_wgEncodeGencodeCompV22_ENST00000468362.4__range=chr3:186785962-186788368__5'pad=0__3'pad=0__strand=+__repeatMasking=none 1356 1356 1294 63 63 117
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/2 125 1 79 plus hg38_wgEncodeGencodeCompV22_ENST00000494445.1__range=chr3:186787314-186789291__5'pad=0__3'pad=0__strand=+__repeatMasking=none 629 337 415 79 79 147
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/2 125 1 79 plus hg38_wgEncodeGencodeCompV22_ENST00000425053.4__range=chr3:186783577-186789881__5'pad=0__3'pad=0__strand=+__repeatMasking=none 1977 1099 1177 79 79 147
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/2 125 12 79 plus hg38_wgEncodeGencodeCompV22_ENST00000497177.1__range=chr3:186786640-186788507__5'pad=0__3'pad=0__strand=+__repeatMasking=none 849 648 715 68 68 126
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/2 125 12 79 plus hg38_wgEncodeGencodeCompV22_ENST00000485101.4__range=chr3:186783578-186789894__5'pad=0__3'pad=0__strand=+__repeatMasking=none 5327 3739 3806 68 68 126
HWI-D00119:248:CC65CANXX:3:2316:9999:41567/2 125 12 74 plus hg38_wgEncodeGencodeCompV22_ENST00000468362.4__range=chr3:186785962-186788368__5'pad=0__3'pad=0__strand=+__repeatMasking=none 1356 1294 1356 63 63 117

grep -C4 "HWI-D00119:24" CC65CANXX/CC65CANXX.fasta

results in many lines of sequence reads so I think it would be too long to copy here.

Thanks, these look okay to me. Can you do a little experiment for me:
pipe the result into a file:
head CC65CANXX/CC65CANXX.paf > temp.paf
Then run the command that failed on this file:
/home/danwoo/bin/JAFFA-version-2.1/tools/bin/process_transcriptome_align_table temp.paf 1000 /home/danwoo/bin/JAFFA-version-2.1/hg38_genCode22.tab > temp.txt
Do you get the same error?

Also, which version of gcc do you have (gcc --version).
Cheers,
Nadia.

From running the command I get:
Done reading in transcript IDs
Reading the input alignment file, temp.paf
0
5 reads processed. Finished.

also gcc version is 4.8.5

Thanks for you quick replies. How big is your whole CC65CANXX/CC65CANXX.paf file? Would it be too big to email?

Also, were you able to run the demo data okay?
https://github.com/Oshlack/JAFFA/wiki/Example
I'm just trying to work out if there's something unusual about your data or it's your environment.

Thanks for your patience with all the questions!

It doesn't look too big to email so if you provide me with an address I can send it to you.

I have yet to actually run the demo data

Thanks, I was able to reproduce the error with your file. It looks like one the libraries that JAFFA uses fails for gcc version 4.8. If you can switch to a higher gcc version, I think this will resolve the problem. I’ve tested with gcc 6.3 and 8.2 and it works okay with both. You would need to remove the “bin” directory under “tools” where you installed JAFFA and then rerun the script, install_linux64.sh.

Let me know how you go with it. Hopefully you have a newer version of gcc available?

hi JAFFA team, thanks for outstanding job,
I have the same problem here after running Jaffa. I thought the problem would be the gcc version , but after installing gcc version 9.3 , still I have the same problem , I attached the output report , hope you can help me.

======================= Stage filter_transcripts (merged_enrichment_sAML1b) ========================
Done reading in transcript IDs
Reading the input alignment file, /hpcnfs/scratch/PGP/niman/Chiara/FLAMES/sAML1_B/Long/enrichment_analysis/merged/jaffa/merged_enrichment_sAML1b.fastq/merged_enrichment_sAML1b.fastq.paf
0
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error

by running which gcc :
alias gcc='/.conda/envs/FLAMES/bin/x86_64-conda-linux-gnu-gcc'
~/.conda/envs/FLAMES/bin/x86_64-conda-linux-gnu-gcc

Running gcc -v:
gcc version 9.3.0 (crosstool-NG 1.24.0.131_87df0e6_dirty)

thanks in advance

Thanks for reporting this.

Do you also get this error on other datasets (e.g. the example data)? Other versions of gcc (e.g. 5 or 6)? Did you reinstall after upgrading gcc? To reinstall you'd need to remove /tools/bin/process_transcriptome_align_table and then rerun /install_linux64.sh

Cheers,
Nadia.

@nadiadavidson
I ran it based on the different version of the gcc (4.8. 4.9. 9.3) and for each I removed the bin and ran the /install_linux64.sh again to build the bin folder. but all I got the same error,
I didn't try on the other sample or data, do you think it might be the problem of data??

Thanks

Hi,

I suspect you'd get the same error with the example dataset, but it could be useful to know that for sure, otherwise it would be difficult for me to reproduce the error. If you are running on a server/cluster could you use the gcc version installed on there rather than conda's version?

If all else fails I can probably add a static binary to the repository/next release to avoid compilation.

Cheers,
Nadia.

Hi,
I have just installed jaffa with mamba (v2.2) but it seems JAFFAL (for ONT reads) is missing : I found only jaffa-hybrid, jaffa-direct and jaffa-assembly commands.
Could you help me? thanks

Overlaps with Issue #74 , so will close this one.