WmfOnlineDailyDumpFile incorrectly checks for availability
TheEaterr opened this issue · 2 comments
TheEaterr commented
When trying to use this toolkit to manage downloading dumps, I encountered an issue on the way they are determined to be available.
The code responsible for this is :
protected boolean fetchIsDone() {
boolean result;
try (InputStream in = this.webResourceFetcher
.getInputStreamForUrl(getBaseUrl() + "status.txt")) {
BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(in, StandardCharsets.UTF_8));
String inputLine = bufferedReader.readLine();
bufferedReader.close();
result = "done".equals(inputLine);
} catch (IOException e) { // file not found or not readable
result = false;
}
return result;
}
However, when checking what is provided by the WMF, we see that status.txt doesn't show just done anymore but done:all (and perhaps other, I haven't made an exhaustive check), see: https://dumps.wikimedia.org/other/incr/wikidatawiki/20240414/status.txt
Would it possible to update the "done".equals(inputLine);
so it is correct ? (perhaps with startsWith ?
)
wetneb commented
@TheEaterr that sounds good - would you like to submit a pull request for this? Using startsWith
sounds like a good solution.