question about cloud budget
Opened this issue · 1 comments
Hello,I noticed that your cloud budget records in wiki.
"Starting late in the evening on January 11th and running Serratus casually for the next 11 days (non continuous use, at ~80% of it's maximum capacity to favour stability over performance) we complete a search of 5,686,715 sequencing libraries (10.2 petabases). The total cost of a full ground-up re-analysis was $23,980 or $0.0042 per library. This value reflects the current state-of-the-art for Serratus, and to the best of our knowledge any means of ultra-rapid access to petabases of sequencing data."
I have constructed a test cluster with 5 EC2 instances and have completed testing of 10,000 libraries. In terms of costs, just for the initial download phase, my expenses have reached $0.03 per library, which is ten times your cost. Therefore, could you share the details of your cluster configuration? Specifically, I am interested in the EC2 types, the number of instances, network bandwidth, and the degree of task parallelism per instance.
Yeah it's all in the terraform: https://github.com/ababaian/serratus/blob/master/terraform/main/main.tf and we report the specifics of the cluster in the paper.
Also depending on what your query fasta file and type of libraries you're analyzing are, cost is going to vary substantially based on the hit-rate of your seed kmers in bowtie/diamond.