Haphazard leap second file handling needs re-working
Closed this issue · 0 comments
The leap second file is checked and possibly downloaded by both the Indexer and Ingester classes upon initialization. Not only is this unnecessarily often (the leap seconds file changes rarely), it also sets up a race condition where parallel workers can be writing the file at the same time.
Instead, the file should be checked at the beginning of each "run", either at the user-invocation of a retrieve
or the retrieve initiated by rover in daemon mode.
Furthermore, on an initial run with the default the option leap-file=leap-seconds.list
is handled differently by the two classes. The Indexer checks-and-downloads a leap-seconds.list
file relative to the user's CWD, but the Ingester (correctly) checks-and-downloads a file at a location relative to the configuration file. If the user always runs rover from the base directory of a repository, they'll not notice this. But if one runs rover from another location and uses the -f [config]
option, these end up being two different files.
To illustrate:
rover init datarepo
rover -f datarepo/rover.config retrieve TA_MSTX__LHZ 2010-1-1 2010-1-2
ls -l leap-seconds.list datarepo/leap-seconds.list
-rw-r--r-- 1 XXX YYY 10663 Nov 29 17:09 datarepo/leap-seconds.list
-rw-r--r-- 1 XXX YYY 10663 Nov 29 17:09 leap-seconds.list
Now we have two leap second files.
This second problem is probably not worth diagnosing (but does explain all the wayward leap-seconds.list files I have in my testing environments) as it will be moot if we move this check and download into a higher level "run" initialization.