TetraChan
A multiplatform and multilanguage dictionary lookup program. Popup dictionaries for the web browser, discord bot, etc.
You can see this in action within this https://discord.gg/8ZCWsGF discord server, which is the server for my various projects.
This readme will mostly be to keep track of things.
Current Status
Branches
-
Discord bot (Node app, ES6+ okay)
- Jisho (API)
- CC-CEDICT (Local)
- Oxford English Dictionary (API)
- Goo (web crawling)
- partically complete and partially transitioning over to wrapper
- using htmlparser2 and made a small wrapper for traversing it
- Goo throttles requests, unsure of what the cooldown is
- Stroke order (MDBG)
-
Plugin (WebExtensions, ES5 only!)
- Had, Popup, Options, Templating sort of half implemented and working but then started on making a build environment and discord bot
- Haven't yet refactored code to work with webpack build environment
- Problems with local dictionaries
- Storage limits for browser extensions. Then questionable if we can maintain of persistence between sessions.
- May want to load into memory, etc.
- HTML5's FileReader API, SQLite, IndexedDB
- Interfacing with pre-installed Rikaichan dictionaries (which are just JMDict). May have issue with WebExtensions vs XUL incompatibility.
- Also can't use OED's API, may want to parse HTML instead. Or use Longman.
-
Plugin-alternative Website (Node and Browser)
- Going to use express for the general case since bandwidth load is on user. I believe Express will handle all the redirection, etc.
-
Plugin-alternative Website Hosted (Good for demos)
- ./test/httpsserver.js is the stripped down of a the node fetcher that was intended my attempts at understanding how this works
-
Mobile App
- Not started
- May want to consider Progressive Web App approach
Important To Do Items and Research
- Bot hosting
- Handle all the requests that this meta project will produce with same node application
- Support for CC-CANTO
- Support for classical Weiblo
- Support for classical Ctext
- Support for Stardict format
- Sanseido search from Rikaisama?
- Support for EPWING
- Huge massive headache, but would really like to have support for Dajirin, etc.
- Almost all solutions based on
- Zero-epwing by foosoft lists out all the problems with this format and necessary research into the format
Lexicon Structure
See ./src/core/lexicon.js for the structure inside the code.
This localized API/interface for this project for how the words are stored. Used to santize the output of the many dictionaries into a standard format. Will be using the semicolon ';' as the separator for entries that are still a single entry, see the example.
At the moment there is are three sublevels to the lexicon
- Lexicon (Dictionary)
- A query for a dictionary would return a lexicon
- Lexeme (may change to 'Headword' as Oxford uses it)
- Same words but written differently
- character varients
- eg. 見る/観る would listed a single lexeme
- Classes (Part of Speech)
- Senses (Definition)
An example: source
- [Lexicon Level] Only single object created per request
- Query Jisho.org for あと
- [Lexeme level] stores reading, ipa, alternative readings, etc. at this level
- 後
- [Class level]
- Noun; No-adjective
- [Sense level] Also can exclude a list of examples per definition
- 1.behind; rear
- 2.after; later
- 3.after one's death
- 4.reaminder; the rest
- 5.descedant; successor; heir
- Adverbial noun
- 1.descendant; successor; heir
- 2.also; in addition
- Noun; No-adjective
- 1.past; previous
- 跡, examples of alternative forms are 迹、痕、址. Currently calling these allographes.
- ...
Conceptual notes
This varies on two major points from how Oxford structures how words are handled.
- Under 'Headwords' (Lexeme level as I have it) there are is an additional level of called 'entries' in its API. Haven't yet found a word that has several entries. So I just assume they all fall under the same headword (ie. more part of speech entries per headword)
- Under senses, there are sub-senses. Not really sure why these are significant but I just add all sub-senses as a sense themselves. May want to add them as a single sense separated by semicolons.
May also want implement a switch to alternate between accounts to get around free limitation (or not cause that's not very nice).
Links that will be useful later on
https://www.reddit.com/r/discordapp/comments/4zy6o4/what_udptcp_ports_do_i_need_to_open_for_discord/