onurgu/ner-tagger-dynet

About the meaning of outputs

theanhle opened this issue · 5 comments

Firstly, I would like to say thanks to you for your sharing.
When I ran training file: 'train_tensorflow.py' I got negative numbers that I don't understand what they mean?
Starting epoch 5...
Reshuffling
n_batches: 29
bucket_id: 7
-8.567176 -7.873434 -7.972278 n_batches: 29
Reshuffling
bucket_id: 5
-7.769209 -7.041025 -6.468853 Reshuffling
n_batches: 29

Could you mind explaining for me? where is the accuracy, Precision, Recall, F1 on training set, dev set?
Thank you in advance!

P.s: I also ran source code given by Lample et al, and here is output:
processed 8843 tokens with 388 phrases; found: 332 phrases; correct: 243.
accuracy: 96.57%; precision: 73.19%; recall: 62.63%; FB1: 67.50
ORG: precision: 73.09%; recall: 64.54%; FB1: 68.55 249
PER: precision: 73.49%; recall: 57.55%; FB1: 64.55 83
ID NE Total O I-ORG B-ORG B-PER I-PER Percent
0 O 8015 7942 28 27 9 9 99.089
1 I-ORG 357 66 277 7 0 7 77.591
2 B-ORG 282 63 13 194 12 0 68.794
3 B-PER 106 15 2 21 62 6 58.491
4 I-PER 83 5 13 0 0 65 78.313
8540/8843 (96.57356%)
Score on dev: 67.46000
Score on test: 67.50000
New best score on dev.

Thanks for your interest!

It's the log likelihood of the CRF model. Actually, we minimize the negative of this expression to optimize the parameters.

So we expect it to increase towards zero through training.

For evaluation, you should run eval_tensorflow.py separately like

python eval_tensorflow.py --pre_emb we-300.txt \
--train dataset/train \
--dev dataset/dev \
--test dataset/test \
--word_dim 300 --word_lstm_dim 200 --word_bidirect 1 \
--cap_dim 100 \
--crf 1 \
--lr_method=sgd-lr_0.01 \
--maximum-epochs 100 \
--char_dim 200 --char_lstm_dim 100 --char_bidirect 1

Thank you, now I understand . But it seems quite complicated for me to understand whole code. And I want to write another myself. So if you don't mind, please outline steps of implementation (like a brief explanation): For example, how to use bi-lstm to tackle ner problem, then how to include crf layer to improve accuracy, ..
Any sharing from you is appreciated!

It's beyond the scope of this medium. I would definitely share if I had written a coding tutorial myself, however there are wonderful tutorials about Tensorflow by Google:

https://www.tensorflow.org/tutorials/

I believe they are more than enough to get into programming in Tensorflow.

Thank you, I will do that.

Never mind :)

Closing this issue, please feel free to ask if you have more questions.