rdoeffinger/Dictionary

Word pages look unformatted

Opened this issue ยท 6 comments

The pages for each particular work look unformatted with lots of metadata tags output as raw text.
Is this intentional? I've just installed the app and testing.

An example (EN.quickdic) rendered in QuickDic compared to the same Wiktionary page in Firefox Android:

drawing drawing

App version: 5.5.6

"Intentional" is the wrong word.
The wiktionary data looks like on the left side, and there is no easy to use/integrate code to convert it to the right side.
Support has been added for some specific, common ones. It would be possible to add support for some more, and for some others maybe just remove them (as they increase dictionary size without much benefit, for example online links are of somewhat questionable use in an offline dictionary).
It would be some work though, and only improve things, not completely fix it.

I suppose Wikimedia should have the parser for this markup. Maybe you can import them?

ilius commented

I have this problem as well in my Python tool: ilius/pyglossary#48

I think using .zim files (from Kiwix project) is the easiest way to use Wiktionary or Wikipedia offline.
There is libzim

There actually is an easy way to extract the formatted data using https://github.com/tatuylonen/wiktextract

ilius commented

That tool simply downloads the rendered HTML from Wiktionary website one entry at a time.
It does not render it.
It's also in Python. This is a Java project.

You use it to extract the information which you can then convert to the same format this dictionary is using, making it human readable. I'm using it in my app, there's no readme yet but you can compile and see for yourself how its much cleaner and readerable