datamade/probablepeople

training probablepeople with custom label

Closed this issue · 0 comments

Hello Everyone,
I'm trying to retrain the probable people, with custom keywords for my domain. I work on healthcare data and trying to eliminate false positives on names using custom label "Clinical Keyword".

I have updated labels in "person_labelled.xml".

while running "make all" command , I'm getting the following error

$ make all
parserator train name_data/labeled/company_labeled.xml,name_data/labeled/person_labeled.xml probablepeople --modelfile=generic
Traceback (most recent call last):
  File "c:\users\arunk\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\arunk\anaconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\arunk\Anaconda3\Scripts\parserator.exe\__main__.py", line 7, in <module>
  File "c:\users\arunk\anaconda3\lib\site-packages\parserator\main.py", line 70, in dispatch
    args.func(args)
  File "c:\users\arunk\anaconda3\lib\site-packages\parserator\main.py", line 94, in train
    training.train(module, training_data, model_path)
  File "c:\users\arunk\anaconda3\lib\site-packages\parserator\training.py", line 50, in train
    trainModel(training_data, module, model_path)
  File "c:\users\arunk\anaconda3\lib\site-packages\parserator\training.py", line 29, in trainModel
    tokens, labels = list(zip(*components))
ValueError: not enough values to unpack (expected 2, got 0)

training model on 3008 training examples from ['name_data/labeled/company_labeled.xml', 'name_data/labeled/person_labeled.xml'] file(s)
make: *** [Makefile:6: probablepeople/generic_learned_settings.crfsuite] Error 1

Could you please let me know how to retrain and use the updated algorithm.

Thank you