New labeling regimes for ACTER datasets.

Question

New labeling regimes for ACTER datasets.

Opened this issue 9 months ago · 1 comments

Hi @AylaRT,
Thanks for the contribution of ACTER corpora, which is very meaningful for term extraction.

While working on the datasets, we discovered that the current token classifiers with the BIO annotation regime do perform not so well on nested terms. Thus, we would like to propose a new annotation regime where we also annotate single-word nested terms.

Please take a look at the new annotation, which can be seen via this link:
https://github.com/honghanhh/nobi_annotation_regime

It would be nice if we could integrate our proposals as the next version of the corpora.
Please let us know if you need any further information in advance.

Thanks a lot.
Kind regards,
Hanh

Answer 1 · 2024-01-22T08:36:01.000Z

Hi @honghanhh,

Thank you for the kind message and the potential improvement for the dataset! I will definitely add the information for the next version. I cannot guarantee that will be very soon due to time restrictions, but I will keep you posted.

kind regards,
Ayla