ewels/sra-explorer

SRA Explorer not returning results

tamuanand opened this issue · 4 comments

Hi @ewels

https://sra-explorer.info/# is not returning any results (09-May-2020, 822 PM London time)

Probably the EBI/ENA ftp site is down

An update on the above - if you know the ascp command line for a particular record, that aspera download however seems to work

ascp -QT -l 300m -P33001 -i <path>/asperaweb_id_dsa.openssh era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR036/ERR036000/ERR036000_1.fastq.gz .
ewels commented

Tested ERR036000 just now and it seemed to work fine, so I guess that this was just a temporary glitch in the matrix..

Let me know if it keeps happening 🤞

Thanks @ewels - yes, it was a temporary glitch.

I was using the fromSRA channel factory and I believe it is based off your code - https://www.nextflow.io/blog/2019/release-19.03.0-edge.html

One question and one suggestion:

  • question: Does SRA Explorer query NCBI or EBI to get the individual fastq runs? I believe NCBI looking at the fromSRA error messages.

  • suggestion: fromSRA was returning error messages like "can't do nulls on uids". Hence it would be nice to see a similar error reported on SRA Explorer when someone searches for a SRA id or anything, but then the underlying system (NCBI or EBI) had a glitch. In my case, I kept hitting submit with a ID and did not see the bottom change, so I was worried if something was wrong with my browser.

Again, just a suggestion.

Needless to say, it is a great great tool.

On a side note, I have suggested to Paolo/Evan that NF should develop a method to return ascp compatible urls when querying for SRA.

Right now I use fromSRA and then have this ugly looking chained perl regex to ultimately get to a aspera compatible url_download followed by a pipe to bash - would like to know your thoughts/ideas on the below

echo "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR279/SRR279588/SRR279588_1.fastq.gz 
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR279/SRR279588/SRR279588_2.fastq.gz" 
| perl -pe 's#.gz#.gz .#g' | perl -pe 's#.gz .#.gz . &&  #' 
| perl -pe 's#ftp://ftp.sra.ebi.ac.uk/vol#ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh  era-fasp\x40fasp.sra.ebi.ac.uk:vol#g'  > SRR279588.txt

cat SRR279588.txt | bash

What that ultimately translates to is this command on the shell

ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR279/SRR279588/SRR279588_1.fastq.gz . 
&&
ascp -QT -l 300m -P33001 -i <path_to>/asperaweb_id_dsa.openssh  era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/SRR279/SRR279588/SRR279588_2.fastq.gz . 

Hence, it would be nice to have a new channel factory or a method to get aspera compatible urls with NF.

ewels commented

Yeah, I know I should catch errors. Kind of mentioned in #7 (comment) and it's been in the back of my mind for a while. It's a bit crap to just silently die when it hits unexpected errors.

This tool needs quite a lot of work at the moment though, as the SRA is totally replacing their infrastructure so all of the SRA links are stopping working. Unfortunately it's a fairly low priority project for me so it'll probably take me a while until I can find time to invest here.

Does SRA Explorer query NCBI or EBI to get the individual fastq runs?

It queries NCBI first to find the runs and get SRA accessions. Once it has these for individual runs, it queries the EBI to get the FastQ download paths.

The ascp nextflow factory sounds like a sensible idea.. It might complicate things as it requires custom software though, whereas the simple URLs presumably work by default with Nextflow's built-in staging mechanisms (but this is a topic for the nextflow repo, not here 😉 )

Phil