/sengenbango

Parallel JP-EN corpora search combining multiple corpora

Primary LanguageCGNU Affero General Public License v3.0AGPL-3.0

千言万語

Parallel JP-EN corpora search combining multiple corpora. See here for credits for data sources.

Instructions

  1. Parse the data into CSV files. See here for more instructions.

  2. Configure the compose file if needed.

  3. Run docker compose up db to first set up the database if it hasn't already. See here for information about the database.

    1. Once the database is up, run docker compose exec -it db psql -U postgres to get to the postgres console.
    2. Run call copy_data(); to copy the data. This can be done everytime there's an update to the data, clearing existing data first.
    3. If not all sources should be copied, supply an array of sources e.g. call copy(array['basics']);.
    4. If new sources were added, run docker compose exec db psql -U postgres -f docker-entrypoint-initdb.d/01-init.sql to recreate copy_data.
  4. Run docker compose up for everything else.