Test data URL will go down soon
ewels opened this issue · 12 comments
Currently, the test data for this and our other pipelines is hosted on the UPPMAX milou
webexport resource. This is going to disappear pretty soon. UPPMAX have no plans to replace this service anywhere else, so we need to come up with an alternative solution.
Using a project on the SNIC Science Cloud could work (we have a project there). We could also set up a mini sandboxed web server on one of our local servers. Or we could try to use some other kind of public data hosting service (such as GitHub if the data isn't too big?).
Phil
Is your data test that big?
it's 625MB
- I just checked
OK, so yeah, that's quite big...
What are the files you need to make up that much?
Seems like the limit for ordinary repos is 100 MB, but you can apparently use LFS
with files up to 2 GB: https://git-lfs.github.com/
But then you'll have to install LFS
in Travis too ;-)
True, how easy is the Snic cloud to work with?
For CAW, we have a very small data set for testing, and corresponding small references (that we use to build the indexes, and everything).
In the end, it's not that big.
Is a approach like that possible for the RNAseq pipeline?
I think most of the filesize is in the STAR index. So if we build that as part of the tests, we can probably make it a lot smaller...
There is also the possibility of making a container with the references.
Yeah, the STAR-index is most of the file actually:
2.9M Nov 22 2016 SRR4238351_subsamp.fastq.gz
4.4M Nov 22 2016 SRR4238355_subsamp.fastq.gz
4.3M Nov 22 2016 SRR4238359_subsamp.fastq.gz
3.4M Nov 22 2016 SRR4238379_subsamp.fastq.gz
391K Nov 22 2016 genes.bed
11M Nov 22 2016 genes.gtf
12M Nov 22 2016 genome.fa
320B Dec 9 2016 r64
576B Nov 22 2016 star
That's indeed quite a lot...
This is fixed and done..