/SilverDict

Web-Based Alternative to GoldenDict

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

SilverDict – Web-Based Alternative to GoldenDict

favicon

This project is intended to be a modern, from-the-ground-up, maintainable alternative to GoldenDict(-ng), developed with Flask and React.

You can access the live demo here (the button to delete dictionaries is removed). It lives inside a free Okteto container, which sleeps after 24 hours of inactivity, so please bear with its slowness and refresh the page a few times if you are seeing a 404 error, and remember that it may be (terribly) out of sync with the latest code changes.

Screenshots

Light 1 Light 2 Dark Mobile

The dark theme is not built in, but rendered with the Dark Reader Firefox extension.

Some Peculiarities

  • The wildcard characters are ^ and + (instead of % and _ of SQL or the more traditional * and ?) for technical reasons. Hint: imagine % and _ are shifted one key to the right on an American keyboard.
  • This project creates a back-up of DSL dictionaries, overhauls1 them and silently overwrites the original files. So after adding a DSL dictionary to SilverDict, it may no longer work with GoldenDict.
  • During the indexing process of DSL dictionaries, the memory usage could reach as high as 1.5 GiB (tested with the largest DSL ever seen, the Encyclopædia Britannica), and even after that the memory used remains at around 500 MiB. Restart the server process and the memory usage will drop to a few MiB.

Features

  • Python2-powered
  • Cleaner code
  • Deployable both locally and on a self-hosted server
  • Fast enough
  • Minimalist web interface
  • Separable client and server components

Roadmap

  • Linux: RPM/Deb packaging (will do when the project is more mature)
  • Windows: package everything into a single click-to-run executable (will do when the project is more mature)

Server-side

  • Add support for Babylon BGL glossary format
  • Add support for StarDict format
  • Add support for ABBYY Lingvo DSL format3
  • Reduce the memory footprint of the MDict Reader
  • Inline styles to prevent them from being applied to the whole page (The commented-out implementation in mdict_reader.py breaks richly-formatted dictionaries.)4
  • Reorganise APIs (to facilitate dictionary groups)
  • Ignore diacritics when searching (testing still wanted from speakers of Turkish, the Semitic languages and Asian languages other than CJK)
  • Ignore case when searching
  • GoldenDict-like morphology-awareness (walks -> walk) and spelling check (fuzzy-search, that is, malarky -> malady, Malaya, malarkey, Malay, Mala, Maalox, Malcolm)
  • Transliteration for the Cyrillic, Greek, Arabic, Hebrew and Devanagari scripts
  • Add the ability to set sources for automatic indexing, i.e. dictionaries put into the specified directories will be automatically added
  • Recursive source scanning
  • Multithreaded article extraction
  • Improve the performance of suggestions matching (partially done, 'contains' search is still slow)
  • Make the suggestion size customisable
  • Allow configure suggestion matching mode, listening address, running mode, etc. via a configuration file, without modifying code

Client-side

  • Offer readily built static files for users unfamiliar with the front-end development process (Artefacts built with GitHub Actions are only visible to me and the URL is not permanent)
  • Allow zooming in/out of the definition area
  • Make the strings translatable
  • Beautify the dialogues (help wanted!)
  • GoldenDict-like dictionary group support
  • A mobile-friendly interface (retouch needed)
  • A real mobile app

Issue backlog

Usage

Dependencies

This project utilises some Python 3.10 features, such as the match syntax, and a minimal set of dependencies:

PyYAML # for better efficiency, please install libyaml
Flask
Flask-Cors
waitress
lxml

Local Deployment

The simplest method to use this app is to run it locally. I would recommend running the custom HTTP server in the http_server sub-directory, which forwards requests under /api to the back-end, and serves static files in ./build/.

cd client
yarn install
yarn build
mv build ../http_server/

And then:

pip3.10 install -r http_server/requirements.txt # or install with your system package manager
python3.10 http_server/http_server.py # working-directory-agnostic
pip3.10 install -r server/requirements.txt
python3.10 server/server.py # working-directory-agnostic

Then access it at localhost:8081.

Alternatively, you could use dedicated HTTP servers such as nginx to serve the static files and proxy API requests. Check out the sample config for more information.

Server Deployment

I recommend nginx if you plan to deploy SilverDict to a server. Before building the static files, be sure to modify API_PREFIX in config.js, and then place them into whatever directory where nginx looks for static files. Remember to reverse-proxy all API requests and permit methods other than GET and POST.

Assuming your distribution uses systemd, you can refer to the provided sample systemd config and run the script as a service.

NB: currently the server is memory-inefficient: running the server with eight mid- to large-sized MDict dictionaries consumes ~200 MiB of memory, which is much higher than GoldenDict.5 If you want an MDict server with low memory footprint, take a look at xiaoyifang/goldendict-ng#229 and subscribe to its RSS feed. A possible work-around: ditch MDict. Convert to other formats with pyglossary (might not work). There are no such issues with StarDict or DSL.

Docker Deployment

Check out my guide.

[Horribly outdated. Will work on this soon.]

Acknowledgements

The favicon is the icon for 'Dictionary' from the Papirus icon theme, licensed under GPLv3.

This project uses or has adapted code from the following projects:

Name Developer Licence
mdict-analysis Xiaoqiang Wang GPLv3
python-stardict Su Xing GPLv3
dictionary-db Jean-François Dockes GPL 2.1
idzip Ivo Danihelka
pyglossary Saeed Rasooli GPLv3

I would also express my gratitude to Jiang Qian for his suggestions, encouragement and great help.

Similar projects


Footnotes

  1. What it does: (1) decompress the dictionary file if compressed; (2) remove the BOM, non-printing characters and strange symbols (only {·} currently) from the text; (3) normalize the initial whitespace characters of definition lines; (4) overwrite the .dsl file with UTF-8 encoding and re-compress with dictzip. After this process the file is smaller and easier to work with.

  2. A note about type hinting in the code: I know for proper type hinting I should use the module typing, but the current way is a little easier to write and can be understood by VS Code.

  3. I tested with an extremely ill-formed DSL dictionary, and before such devilry my cleaning code is powerless. I will look into how GoldenDict handles this.

  4. The use of a custom styling manager such as Dark Reader is recommended until I fix this, as styles for different dictionaries meddle with each other.

  5. I grabbed a profiler and found the root of the cause: the MDict library stores many things in memory, so it is impossible for me to fix this without rewriting the library. Besides, I cannot instantiate MDX lazily, or the waiting time would easily get well beyond half a second.