eb4j/dsl4j

Dictionary load error

Closed this issue · 6 comments

plotn commented

Version 0.5.0:

https://drive.google.com/file/d/14Crq8ywyBdfC1YnDjsv7gZOu70ZH_4cd/view?usp=sharing
https://drive.google.com/file/d/1TYDfxr_j0b3A_h99kmqbul1iFKxiGhfq/view?usp=sharing

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 761 out of bounds for length 385

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 942 out of bounds for length 477

stacktrace is:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 942 out of bounds for length 477
at org.dict.zip.DictZipHeader.getPosition(DictZipHeader.java:398)
at org.dict.zip.DictZipInputStream.seek(DictZipInputStream.java:271)
at org.dict.zip.DictZipInputStream.reset(DictZipInputStream.java:154)
at io.github.eb4j.dsl.impl.EntriesLoaderImpl.skipSpaceTabs(EntriesLoaderImpl.java:270)
at io.github.eb4j.dsl.impl.EntriesLoaderImpl.load(EntriesLoaderImpl.java:106)
at io.github.eb4j.dsl.DslDictionaryLoader.load(DslDictionaryLoader.java:98)
at io.github.eb4j.dsl.DslDictionary.loadDictionary(DslDictionary.java:148)

The command was:
DslDictionary dslDictionary = DslDictionary.loadDictionary(
// new File("c:\github\JsoupExperiments_tmp\test4.dsl"));
// Paths.get("c:\github\JsoupExperiments_tmp\Apresyan\En-Ru_Apresyan.dsl"),
// Paths.get("c:\github\JsoupExperiments_tmp\Apresyan\En-Ru_Apresyan.dsl.idx")
// Paths.get("c:\github\JsoupExperiments_tmp\mueller\Mueller (En-Ru)_new.dsl.dz"),
// Paths.get("c:\github\JsoupExperiments_tmp\mueller\Mueller (En-Ru)_new.dsl.idx")
Paths.get("c:\github\JsoupExperiments_tmp\smirnitsky\Ru-En-Smirnitsky.dsl.dz"),
Paths.get("c:\github\JsoupExperiments_tmp\smirnitsky\Ru-En-Smirnitsky.dsl.idx")
);

Trying to unpack dz, get the following:
Exception in thread "main" java.lang.NullPointerException
at io.github.eb4j.dsl.index.DslIndex$Builder.setDictionaryName(DslIndex.java:2123)
at io.github.eb4j.dsl.DslDictionaryLoader.buildIndexFile(DslDictionaryLoader.java:175)
at io.github.eb4j.dsl.DslDictionaryLoader.load(DslDictionaryLoader.java:102)
at io.github.eb4j.dsl.DslDictionary.loadDictionary(DslDictionary.java:148)

could you place the farmer files in the project's src/test/resources/content and run test from source ./gradlew test?
I think I've already tested it in class src/test/java/ip/github/eb4j/dsl4j/DslProprietaryTest and passed. https://github.com/eb4j/dsl4j/blob/main/src/test/java/io/github/eb4j/dsl/DslProprietaryTest.java#L19-L23

Thank you for report.

  • ArrayIndexOutOfBoundsException is caused by a diczip v0.12.0 and v0.12.1 bug. It increase position pointer when readFully() method called. The pointer is used when is.reset() method, then throw the exception. It will be fixed in next dictzip release.

  • NullPointerException is caused because your data has a format "Big-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators". DSL4j support standard UTF-16 Little-Endian, see README matrix, and does not recognize BE. DSL4j think the data is not UTF-16LE so it try to parse as UTF-8, then failed to get correct index.

#70 and #72 will improve loader

@plotn

  • NullPointerException is caused because your data has a format "Big-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators". DSL4j support standard UTF-16 Little-Endian,

IMHO, in shortly said, your files are in wrong format. see README.

plotn commented

I see, thank you. Bt on the other side - I've chosen random file from internet, they are quite old and I think people made them and use them, and they are "almost ok". Anyway - I've integrated them into my reading app, thank you: https://4pda.to/forum/index.php?s=&showtopic=995536&view=findpost&p=113572389

resolved.