Multi-Task Learning on OntoNotes

Dataset

OntoNotes Release 5.0

Joint Loss

$$ \mathcal{L}(\mathbf{W}, \sigma_1, \sigma_2, \sigma_3) = \sum_{i=1}^{3}{\frac{1}{\sigma_i^2}}\mathcal{L}_i(\mathbf{W}) + \mathrm{log}\sigma_i $$

Results

Tasks CRF Loss Split POS (Acc) NER (F1) CHUNKING (F1)
ALL
× auto 96.58% 81.73% 90.90%
× 1-1-1 96.45% 81.33% 91.11%
auto 96.54% 83.72% 92.07%
1-1-1 96.58% 83.91% 92.24%
POS & NER
auto -
0-1-0 - 82.87% -
1-9-0 96.00% 82.90% -
2-8-0 96.20% 83.51% -
3-7-0 96.34% 83.09% -
4-6-0 96.40% 83.32% -
5-5-0 96.45% 83.29% -
6-4-0 96.37% 82.93% -
7-3-0 96.27% 82.37% -
8-2-0 96.25% 81.76% -
9-1-0 96.40% 80.53% -
1-0-0 96.30% - -
POS & CHUNK
auto -
1-0-0 96.30% - -
9-0-1 96.26% - 90.78%
8-0-2 96.21% - 91.55%
7-0-3 96.35% - 91.87%
6-0-4 96.38% - 92.08%
5-0-5 96.30% - 92.10%
4-0-6 96.28% - 92.28%
3-0-7 96.28% - 92.35%
2-0-8 96.15% - 92.22%
1-0-9 95.96% - 92.22%
0-0-1 - - 91.94%
NER & CHUNK
auto -
0-1-0 - 82.87% -
0-9-1 - 82.84% 90.21%
0-8-2 - 82.91% 91.28%
0-7-3 - 83.43% 91.44%
0-6-4 - 82.98% 91.78%
0-5-5 - 82.66% 91.82%
0-4-6 - 82.39% 91.97%
0-3-7 - 82.37% 92.02%
0-2-8 - 81.35% 92.07%
0-1-9 - 79.95% 91.99%
0-0-1 - - 91.94%

Hyper-parameters

embed_dim # layer hidden size dropout learning rate
300 2 512 0.5 1e-3

Reference

Weischedel, Ralph, et al. OntoNotes Release 5.0 LDC2013T19. Web Download. Philadelphia: Linguistic Data Consortium, 2013.