Wikidata/Wikidata-Toolkit

JSON dump parsing error while running examples

dukesun99 opened this issue · 5 comments

I encountered below error message when running the examples GreatestNumberProcessor and GenderRatioProcessor.

Problematic line was: {"type":"lexeme","id":"L34069","lemmas":{"de":{"la...
Error when reading JSON for entity: Missing type id when trying to resolve subtype of [simple type, class org.wikidata.wdtk.datamodel.implementation.FormDocumentImpl]: missing type id property 'type' (for POJO property 'forms')
 at [Source: (String)"{"type":"lexeme","id":"L34070","lemmas":{"de":{"language":"de","value":"Zahnl\u00fccke"}},"lexicalCategory":"Q1084","language":"Q188","claims":{"P5185":[{"mainsnak":{"snaktype":"value","property":"P5185","datavalue":{"value":{"entity-type":"item","numeric-id":1775415,"id":"Q1775415"},"type":"wikibase-entityid"},"datatype":"wikibase-item"},"type":"statement","id":"L34070$B9D7523B-8AE6-4C39-85A2-EE7CC71C5228","rank":"normal"}],"P8376":[{"mainsnak":{"snaktype":"value","property":"P8376","datavalue""[truncated 4073 chars]; line: 1, column: 1043] (through reference chain: org.wikidata.wdtk.datamodel.implementation.LexemeDocumentImpl["forms"]->java.util.ArrayList[0])

The dump is automatically downloaded 20210407.json.gz.

I tried to run the code on Windows (Build with Maven with IDEA, JDK 15.0.1) and Ubuntu (openJDK 1.8.0_282, run as IDEA remote).

It seems the dump file size is only ~200MB. Would that be correct?

Any help is appreciated. Thank you.

Hi @dukesun99, any updates on this? I'm experiencing exactly the same problem.

Hi @dukesun99, any updates on this? I'm experiencing exactly the same problem.

Hi, no, I switched to Python to deal with the dump.

Hi @markusforster , even I am facing this issue. Any solutions to it ?

I tried debugging this and it seems that we are at least hurt by https://phabricator.wikimedia.org/T305660. We could introduce a workaround to accept the current format, or just wait for the issue to be fixed in Wikibase.

Hello, i have the same issue, any solution?

UPDATE

i found out that the files it downloads are broken.
i downloaded the file 'latest-all.json.gz' from https://dumps.wikimedia.org/wikidatawiki/entities/ and with this data it works