This is a project of modular Toaq dictionary, aiming to be a complement for the official dictionary and other existing word lists (country names…) and example lists.
toadai.json
contains exclusively community definitions.translations-of-official-definitions.csv
are community translations of official definitions.toatuq.json
contains all community and official definitions, it is automatically generated withunify.py
and should not be edited manually.
unify.py
creates a new universal Toaq dictionary toatuq.json
by combining the contents of toadai.json
, the official dictionary, the example sentence spreadsheets and the country words spreadsheet.
archives/
contains archived data and scripts which shouldn't be modified anymore. Among these, official-definition-competitors.csv
, toadai-0-orphane-entries.csv
and feedback_on_imported_entries.json
are the files which remain most useful as of now.
official-definition-competitors.csv
is the list of community definitions competing for an already official word. Some of these definitions are redundant or obsolete, but some of them are still relevant proposals of improvement of existing official definitions; ideally pull requests or issues should be opened for each of these still relevant proposals in the official dictionary repository.
toadai-0-orphane-entries.csv
are definitions which weren't assigned to any word or were assigned to an already existing word; they should be given a different word form, or better arguments for justifying changing the definition of their targeted word forms should be provided (ideally in a GitHub issue or pull request agaisnt the official dictionary repository).
feedback_on_imported_entries.json
contains all the community comments and votes information previously added to entries which were imported from external sources: official dictionary, example spreadsheets, etc.
archives/toadai-0.json
is the original basis for toadai.json
and contains reformated Toadua data excluding all definitions which were automatically imported from external dictionaries (i.e. official definitions, example sentences, country words…) and with downvoted entries removed. It has been automatically generated from archives/200708-last-toadua-dump.json
using archives/toadua_json_to_toadai-0.py
. The Toadua data also contained comments and votes information on the aforementioned imported entries; these were removed from archives/toadai-0.json
and were stored instead in a dedicated database archives/feedback_on_imported_entries.json
.
archives/toadai-mono-0.json
was generated from archives/toadai-0.json
with the archives/make-toadai-mono-0.py
script, whose purpose is to filter out duplicates and merge synonyms and definition translations into single entries. All the duplicate entries which were filtered out are stored in archive databases, such as archives/toadai-0-deleted-entries.json
, archives/toadai-0-orphane-entries.json
an archives/official-definition-competitors.csv
.
I plan to further modularize the dictionary by moving particle definitions and example entries to dedicated databases (soatoadai.json
and muadai.json
respectively) and then hook most or all remaining definitions in toadai.json
to Predilex data.
This should not invalidate the content of toatuq.json
as it would still be automatically generated by merging all the vocabulary modules, including the aforementioned envisioned new ones.
See the file LICENSE.md
.
∎