/toakao

Primary LanguagePythonOtherNOASSERTION

toakao

This is a project of modular Toaq dictionary, aiming to be a complement for the official dictionary and other existing word lists (country names…) and example lists.

  • toadai.json contains exclusively community definitions.
  • translations-of-official-definitions.csv are community translations of official definitions.
  • toatuq.json contains all community and official definitions, it is automatically generated with unify.py and should not be edited manually.

unify.py creates a new universal Toaq dictionary toatuq.json by combining the contents of toadai.json, the official dictionary, the example sentence spreadsheets and the country words spreadsheet.

archives/ contains archived data and scripts which shouldn't be modified anymore. Among these, official-definition-competitors.csv, toadai-0-orphane-entries.csv and feedback_on_imported_entries.json are the files which remain most useful as of now.

official-definition-competitors.csv is the list of community definitions competing for an already official word. Some of these definitions are redundant or obsolete, but some of them are still relevant proposals of improvement of existing official definitions; ideally pull requests or issues should be opened for each of these still relevant proposals in the official dictionary repository.

toadai-0-orphane-entries.csv are definitions which weren't assigned to any word or were assigned to an already existing word; they should be given a different word form, or better arguments for justifying changing the definition of their targeted word forms should be provided (ideally in a GitHub issue or pull request agaisnt the official dictionary repository).

feedback_on_imported_entries.json contains all the community comments and votes information previously added to entries which were imported from external sources: official dictionary, example spreadsheets, etc.

archives/toadai-0.json is the original basis for toadai.jsonand contains reformated Toadua data excluding all definitions which were automatically imported from external dictionaries (i.e. official definitions, example sentences, country words…) and with downvoted entries removed. It has been automatically generated from archives/200708-last-toadua-dump.json using archives/toadua_json_to_toadai-0.py. The Toadua data also contained comments and votes information on the aforementioned imported entries; these were removed from archives/toadai-0.json and were stored instead in a dedicated database archives/feedback_on_imported_entries.json.

archives/toadai-mono-0.json was generated from archives/toadai-0.json with the archives/make-toadai-mono-0.py script, whose purpose is to filter out duplicates and merge synonyms and definition translations into single entries. All the duplicate entries which were filtered out are stored in archive databases, such as archives/toadai-0-deleted-entries.json, archives/toadai-0-orphane-entries.json an archives/official-definition-competitors.csv.

Roadmap

I plan to further modularize the dictionary by moving particle definitions and example entries to dedicated databases (soatoadai.json and muadai.json respectively) and then hook most or all remaining definitions in toadai.json to Predilex data.

This should not invalidate the content of toatuq.json as it would still be automatically generated by merging all the vocabulary modules, including the aforementioned envisioned new ones.

License

See the file LICENSE.md.