About CRF layer
EternalEep opened this issue · 3 comments
Hi Allan,
I also have some questions for CRF.
1. I have trained my BERT-CRF code, but I found it's not better than transformers BertForTokenClassification class for my dataset. Do you know what's the problem that CRF doesn't work for my situation?
2. I have not read the CRF code carefully because it's a little hard to understand, can you give us some references/comments to understand it quickly and clearly?
Thank you for you help!
- You might put some example data here for me to take a look.
- It takes some time for me to find you some good references, but I think "neural architectures for named entity recognition" is a pretty good paper.
- I will try to find you some good materials today as well
Firstly, I have to say that your code provide a high-quality NER code for me to study NER structure. Thanks!
For your reply:
- You might put some example data here for me to take a look.
Yes, I modify your BIOES to BIO format first. Then I trained in my biomedical domain dataset. I get almost the same results for BertForTokenClassification in transformers and your Bert+CRF code. These are the examples for BIO format:
Urokinase B-Gene_or_gene_product
receptor I-Gene_or_gene_product
: O
a O
molecular O
organizer O
in O
cellular B-Cell
communication O
. O
uPAR B-Gene_or_gene_product
is O
also O
coexpressed O
with O
caveolin B-Gene_or_gene_product
and O
members O
of O
the O
integrin B-Gene_or_gene_product
adhesion I-Gene_or_gene_product
receptor I-Gene_or_gene_product
superfamily O
. O
-
It takes some time for me to find you some good references, but I think "neural architectures for named entity recognition" is a pretty good paper.
-
I will try to find you some good materials today as well
Yes, it's a good paper, I will take a look at it again. Actually, I already read some papers about BiLSTM-CRF. They usually written the paper such as Section 2.2 in "neural architectures for named entity recognition" paper. And I can understand what CRF have done.
But for real code implementation, It's a little hard for me to understand the code clearly, such as forward, backward and veterbi decode implementation. If you have some good references, please help me about the CRF understanding. Thank you very much!
Yeah, things like "batching" become "ugly" when we come to the neural network era and we have to deal with matrix calculation.
This is kind of the best material I read for forward-backward algorithm: https://my.eng.utah.edu/~cs6961/papers/klinger-crf-intro.pdf