karlb/wikdict-gen

Save Wiktionary categories

rominf opened this issue · 3 comments

Thank you for the nice dictionaries, I really appreciate your work!

I'm building a Telegram bot for learning languages and I find wikdict dictionaries very helpful.

However, I think that it would be better to preserve categories in the databases, to allow user to select only required words from a certain categories.

karlb commented

I'm totally willing to preserve the categories. But for that to work, I need to get them out of the wiki text markup into RDF, which has to be done for every language separately. This step is performed by http://kaiko.getalp.org/about-dbnary/ and I haven't seen the categories in there last time I looked. Categories for which language are most interesting for you? Could you give a specific example of a word an the expected category from Wiktionary (this avoids misunderstanding, there are many subtly different items in Wiktionary)?

I'm planning to build a universal bot, but for now I'm interested in three languages: English, Finnish, and Danish.
For example, it would be great if the user could request all computing words (https://en.wiktionary.org/wiki/Category:en:Computing), including https://en.wiktionary.org/wiki/program

karlb commented

OK, I clearly know what you mean, now. I had a look at the RDF data from dnary again and was not able to find any category information. Until that changes, I don't see a way to add that data with an acceptable amount of effort.