Semantle-es

This is a spanish version of Semantle.

Running locally

Get spanish Word2vec dataset from Spanish Billion Word Corpus and Embeddings. Download the word2vec binary format to the data directory. Unzip it
Download the "Lista total de frecuencias" data file (CREA_total.ZIP) from Corpus de Referencia del Español Actual (CREA) - Listado de frecuencias to the data directory. Do not unzip it
Create a python virtual environment: python3 -m venv .
Activate the environment: source bin/activate
Install all dependencies: python3 -m pip install -r requirements.txt
Load model into sqlite db: python3 dump-vecs.py. Takes ~5min in a 2.4 GHz Intel Core i5 MacBook Pro
Dump hints into pickle file: python3 dump-hints.py. Takes ~30mins in a 2.4 GHz Intel Core i5
Load hints into sqlite db: python3 store-hints.py. Fast.
I don't think we need/use the respelling feature of Semantle-en, so no need to run british.py

TBD

Original Semantle code by David Turner. Changes:

Word2vec data set by Cristian Cardellino. Citation:

Cristian Cardellino: Spanish Billion Words Corpus and Embeddings (March 2016), https://crscardellino.github.io/SBWCE/

REAL ACADEMIA ESPAÑOLA: Banco de datos (CREA) [en línea]. Corpus de referencia del español actual. http://www.rae.es [2022-02-25]