Wikidata/Wikidata-Toolkit

WmfOnlineDailyDumpFile incorrectly checks for availability

TheEaterr opened this issue · 2 comments

When trying to use this toolkit to manage downloading dumps, I encountered an issue on the way they are determined to be available.
The code responsible for this is :

protected boolean fetchIsDone() {
		boolean result;
		try (InputStream in = this.webResourceFetcher
				.getInputStreamForUrl(getBaseUrl() + "status.txt")) {
			BufferedReader bufferedReader = new BufferedReader(
					new InputStreamReader(in, StandardCharsets.UTF_8));
			String inputLine = bufferedReader.readLine();
			bufferedReader.close();
			result = "done".equals(inputLine);
		} catch (IOException e) { // file not found or not readable
			result = false;
		}
		return result;
	}

However, when checking what is provided by the WMF, we see that status.txt doesn't show just done anymore but done:all (and perhaps other, I haven't made an exhaustive check), see: https://dumps.wikimedia.org/other/incr/wikidatawiki/20240414/status.txt

Would it possible to update the "done".equals(inputLine); so it is correct ? (perhaps with startsWith ?)

@TheEaterr that sounds good - would you like to submit a pull request for this? Using startsWith sounds like a good solution.

Created one with an additional fix for the JSON download, although current / full is still broken. See : #872