Word pages look unformatted
Opened this issue ยท 6 comments
"Intentional" is the wrong word.
The wiktionary data looks like on the left side, and there is no easy to use/integrate code to convert it to the right side.
Support has been added for some specific, common ones. It would be possible to add support for some more, and for some others maybe just remove them (as they increase dictionary size without much benefit, for example online links are of somewhat questionable use in an offline dictionary).
It would be some work though, and only improve things, not completely fix it.
I suppose Wikimedia should have the parser for this markup. Maybe you can import them?
I have this problem as well in my Python tool: ilius/pyglossary#48
I think using .zim
files (from Kiwix project) is the easiest way to use Wiktionary or Wikipedia offline.
There is libzim
There actually is an easy way to extract the formatted data using https://github.com/tatuylonen/wiktextract
That tool simply downloads the rendered HTML from Wiktionary website one entry at a time.
It does not render it.
It's also in Python. This is a Java project.
You use it to extract the information which you can then convert to the same format this dictionary is using, making it human readable. I'm using it in my app, there's no readme yet but you can compile and see for yourself how its much cleaner and readerable