nci/gsky

Improving Gsky crawler

Closed this issue · 0 comments

There are two potential improvements in Gsky crawler:

  1. Currnetly the crawler calls gdalinfo for each subdataset in the data file in sequence. This is an IO bottleneck for a data file with lots of subdatasets. We may use goroutine to concurrently call gdalinfo on these subdatasets.
  2. Currently if there is an error in calling gdalinfo on a subdataset in a data file, the error message is suppressed. This is not good for troubleshooting the crawler results. We may output the error messages to stderr so that we can log the errors while not conflicting with stdout for crawling outputs.