/semantle-es

Source code for "Semantle en español"

Primary LanguageJavaScriptGNU General Public License v3.0GPL-3.0

Semantle-es

This is a spanish version of Semantle.

Running locally

One-time setup

  1. Get spanish Word2vec dataset from Spanish Billion Word Corpus and Embeddings. Download the word2vec binary format to the data directory. Unzip it
  2. Download the "Lista total de frecuencias" data file (CREA_total.ZIP) from Corpus de Referencia del Español Actual (CREA) - Listado de frecuencias to the data directory. Do not unzip it
  3. Create a python virtual environment: python3 -m venv .
  4. Activate the environment: source bin/activate
  5. Install all dependencies: python3 -m pip install -r requirements.txt
  6. Load model into sqlite db: python3 dump-vecs.py. Takes ~5min in a 2.4 GHz Intel Core i5 MacBook Pro
  7. Dump hints into pickle file: python3 dump-hints.py. Takes ~30mins in a 2.4 GHz Intel Core i5
  8. Load hints into sqlite db: python3 store-hints.py. Fast.
  9. I don't think we need/use the respelling feature of Semantle-en, so no need to run british.py

Running it

  1. Run web server: python3 semantle.py

Running in production

One-time setup

TBD

Running it

  1. Run web server: ./start_server_prod.sh

Attribution

Original Semantle code by David Turner. Changes:

  • Improved dump-hints.py performance
  • Add progress indicator to dump and store scripts
  • Localization

Word2vec data set by Cristian Cardellino. Citation:

Cristian Cardellino: Spanish Billion Words Corpus and Embeddings (March 2016), https://crscardellino.github.io/SBWCE/

Frequent words data set from Corpus de referencia del español actual. Citation:

REAL ACADEMIA ESPAÑOLA: Banco de datos (CREA) [en línea]. Corpus de referencia del español actual. http://www.rae.es [2022-02-25]