kblin/ncbi-acc-download

Write to a specified file name without implicit extension

Closed this issue · 9 comments

Please provide an option to write to a specified filename that does not add an implicit file extension.
Two reasons for this:

  1. Allow using an extension other than .fa such as .fasta
  2. Writing to /dev/stdout using -o /dev/stdout to permit piping into another tool, such as gzip.

Why does the implicit extension _0.fa include _0?

kblin commented

What command line did you use that gave you the _0?

kblin commented

Ah, you were using --out. The reason for this is that you can download multiple files, and all of those will be called yourprefix_N.filetype to stop them from overwriting each other.

kblin commented

An alternative solution I'd see would be to add all records to the filename specified with --out, possibly even skipping the "add an extension" logic. Would that work for you?

Yeah, writing them all to the one file specified by --out would work for me. The existing behaviour could be retained as a --prefix option, if you liked.

kblin commented

It's not super-trivial, because right now we run the download_to_file function once per NCBI accession. The reason we do this is that in my experience, larger download batches increase the chance that the file will break off in mid-transfer, and detecting that is hard.

But this means that with the current code and using --out, you'd keep overwriting your file contents because we open with open(filename "w"). This could be fixed using open(filename, "a") instead, but that would break the default case when a file was already downloaded by appending a second copy of the contents.

An option to write to /dev/stdout would be good enough for me.

kblin commented

This is now implemented in version 0.2.0

Thanks, Kai!