optional fa file
simonelsasser opened this issue · 2 comments
I wonder if the genome.fa is used for anything if a bowtie2 index is already present? In this case I think it shouldn't terminate if the fa file is missing but simply put out a warning message. The .fa file is really large and it shouldn't be required if not essential to the pipeline. Most people will have a bowtie2 index but not a .fa file
This is used to calculate the effective genome size. The pipeline can use the compressed version of the .fa
file which is smaller (< 1GB for hg38). I am not sure if this can be calculated from the bowtie indexes, but I would assume that it comes with a cost.
This value could be precalculated and used, as we had in previous versions of the pipeline (it was a config.yaml
value). However, I am not against having the .fa
(or .fa.gz
) together with the bowtie indexes, it offers more complete information on what was the pipeline run on, and we only need to have a single copy of that file.
In an effort to clean up the issue list a bit, I’m closing this now because I agree with Carmen. I would even go so far as to say that the reduced space usage would not be worth the added code and configuration complexity of the pipeline. As she said, we fortunately need only one copy of the FASTA file to be stored anywhere on the system.
We can revisit this when external users complain.