Utilities for development polish language support for Spacy.
Guide is in this document.
Use separated folders for every module pipeline.
We store all artifacts in folders data
and models
.
To add new step to pipeline use
dvc run [-d <dependencies>] [-o <result_artifacts>] (-f <step_name>.dvc) <command>
in main folder.
When something in pipeline changed and you want to run all steps again, use
dvc repro <step_name>
with <step_name>
to which you want to rerun pipeline.
First, set up remote.
Synchronization always works with in connection with current branch and commit.
dvc push
Send all tracked artifacts to remote.
dvc pull
Get all tracked arfitfacts from remote.
dvc pipeline show -c --ascii
To package model please run python deployment/combine_and_package.py
with proper
arguments. For example
python deployment/combine_and_package.py \
--pos-path models/pos_NKJP_word2vec/model-fial/ \
--tree-path models/trees-pos_LFG_word2vec/model-final/ \
--ner-path models/ner_nkjp_word2vec/model-final/ \
--output-path models/release \
--blank-vectors-path models/blank_NKJP_word2vec/
fill poll presented by script with usefull infomations