bionode/bionode-ncbi

download in the examples printing too much info

Closed this issue · 3 comments

I used the example code for DOWNLOAD option and the download is properly done, but there are a bunch of prints being outputted to terminal like this example:

{ uid: '244018',
  url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  status: 'downloading',
  total: 116654441,
  progress: 100,
  speed: 2046569.1403508773 }

It would be better to have some nice output like the one shown in the examples with a progress bar than a lot of prints in each state.
Also, after the download is completed the console freezes in the following prints:

{ uid: '244018',
  url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  status: 'completed',
  total: 116654441,
  progress: 100,
  speed: 'NA',
  size: '111 MB' }

Finally, to escape this, I had to press "ctrl+c" on the console in order to be able to continue inputing new code.

This is also something I've encountered, but never looked into "fixing".

It shouldn't be too hard to take this json object stream of status data and turn it into a nice loading CLI spinner + progress status/bar or something, using \r:

~/Desktop
▶ cat test.js
process.stdout.write('one\r')
process.stdout.write('two\n')

~/Desktop
▶ node test.js
two

However, the lower level object stream is still required for use within Node, and it only makes sense to do \r when using bionode programs interactively on CLI - so there needs to be some way to turn on the "user friendly status" form, but only when used from CLI. Perhaps an option in the API that is default turned on from within CLI code.

I would suggest we put together a standard form (i.e. json-schema) for "in progress JSON objects", and build a "logger" that wraps on top and has nice spinner/progress indicator.

Another option could be, have the "logger" tool take stdin of the object form, then you could do something like:

bionode-ncbi download assembly guillardia theta | thelogger

I think last option is probably the best. The options for "user friendly mode" could simply pipe the object stream into thelogger (from within code so you don't need to ncbi ... | thelogger - but could if you still wanted to).

The freezing is probably from the process not exiting after stream completes (I think it might actually be a duplex stream which is why it doesn't end - because stdin can still exist)

If you comment line 409 and change quiet: true to quiet: false in line 396 you already get a pretty decent loading bar. So, I do like the option to pass the function in line 409 to "thelogger". That way you have a simplified version (or user friendly) and a full logger that outputs all instances of download events that can be parsed by other scripts or programs.

The freezing does not occur if I do not use an interactive section of node, i.e., if I just run something like:

bionode-ncbi download assembly solenopsis invicta

From a reusability point of view, I think bionode tools should always use NDJSON for STDIN and STDOUT. This standard allows developers to easily combine code and write pipelines. But from a UX point of view, it's sometimes ugly and confusing.

I think the general approach to solving this should be @thejmazz last example. That is, piping the NDJSON output to another tool that prettifies it, without modifying bionode-ncbi code or behavior. This tool could be a NodeJS CLI that reuses one of the many progress bar libraries available. We could then reuse it with other bionode modules and CLI tools, in a way similar to what you can already do with GNU pv.

However, in this particular case (as @tiagofilipe12 pointed out) bionode-ncbi already has a pretty progress bar feature available internally due to downloads being handled by the nugget dependency. So we could easily implement a --pretty or --progress-bar option to enable that feature on demand.

I'm happy to have a look at a PR that implements that feature. 😃

BTW, you can also just get the progress percentage on the same line with a bit of BASH scripting.

bionode-ncbi download assembly guillardia theta | json -ga progress | while read i; do echo -ne "\r$i"; done