An Embedded CKIP Rasa NLU Components
This open-source library implements Rasa custom components.
It offerstokenizer
and featurizer
powered by ckiptagger as components in RasaNLU pipeline.
pip install rukip
rukip is a python library hosted on PyPI. Requirements:
- python >= 3.6
- rasa >= 1.4.0
- ckiptagger[tf] >= 0.0.19
The model files are available on several mirror sites.
You can download and extract to the desired path by the following steps:
- Downloads to
./data.zip
- Extracts to
./data/
Add CKIPTokenizer
component into rasa nlu pipeline and configure model_path as ./data
.
The following is the example Rasa NLU config file
language: "zh"
pipeline:
- name: "rukip.tokenizer.CKIPTokenizer"
model_path: "./data"
- name: "rukip.featurizer.CKIPFeaturizer"
model_path: "./data"
token_features: ["word", "pos"]
- name: "CRFEntityExtractor"
features: [["ner_features"], ["ner_features"], ["ner_features"]]
- name: "CountVectorsFeaturizer"
- name: "EmbeddingIntentClassifier"
This component has one required field (model_path) to be configured and offers two optional fields for user to assign dictionaries.
recommend_dict_path
is the file containing list of user-defined recommended-word. Default isNone
cooerce_dict_path
is the file containing a list of must-word. Default isNone
The following is the example of user-defined dictionary. Each line shows one pair of word and weight.
土地公 1
土地婆 1
公有 2
來亂的 1
緯來體育台 1
This component has one required field (model_path) to be configured and offers another optional fields for user to assign.
token_features
is list of features extracted from ckiptagger to generatener_features
. Default is["word", "pos"]
$> git clone git@github.com:circlelychen/rukip.git
$> pip install -r requirements-to-freeze.dev.txt
$> make test
licensed under the GNU General Public License v3.0