/word2morph

Extraction of morphemes from a given lemma

Primary LanguageJupyter NotebookMIT LicenseMIT

WordSegmentation

Build Status

Prerequisites

  • Python 3.6
  • Clone the repository and install the dependencies
git clone https://github.com/MartinXPN/word2morph.git
cd word2morph
pip install .

Train a model

# Basic training 
PYTHONHASHSEED=0 python -m word2morph.train basic_train
        init_data --train_path datasets/rus.train --valid_path datasets/rus.valid
        construct_model --model_type CNN --embeddings_size 8 --kernel_sizes '(5,5,5)' --nb_filters '(192,192,192)' --dilations '(1,1,1)' --recurrent_units '(64,128,256)' --use_crf=True --dense_output_units 64 --dropout 0.2
        train --batch_size 64 --epochs 75 --patience 10 --log_dir logs


# Beam search on the learning rate 
PYTHONHASHSEED=0 python -m word2morph.train lr_beam_search
        init_data --train_path datasets/rus.train --valid_path datasets/rus.valid
        construct_model --model_type CNN --embeddings_size 8 --kernel_sizes '(5,5,5)' --nb_filters '(192,192,192)' --dilations '(1,1,1)' --recurrent_units '(64,128,256)' --use_crf=True --dense_output_units 64 --dropout 0.2
        train --batch_size 64 --epochs 75 --lr_multipliers '(0.5,1,2)' --nb_models 3 --log_dir logs


# Hyperparameter search (Bayesian tuning and bandits)
PYTHONHASHSEED=0 python -m word2morph.train hyperparameter_search
        init_data --train_path datasets/rus.train --valid_path datasets/rus.valid
        search_hyperparameters --nb_trials 50 --epochs 100 --patience 10 --log_dir logs

Predict on test data

PYTHONHASHSEED=0 python -m word2morph.predict
        --model_path logs/<timestamp>/checkpoints/best-model.joblib
        --batch_size 1 --input_path path_to_input.txt --output_path path_to_output.txt