Temporal Convolutional Network with attention layer
Concept of model is mostly like Simple Neural Attentive Meta-Learner (SNAIL). But in this model, attention layer is on every top of convolutions layers. And attention size is differ from SNAIL.
Dataset: Agnews without pre-processing
- with attention: 0.82
- without attention: 0.81
Most of simple models on agnews shows 0.81 accuracy. (Which tested on A Structured Self-Attentive Sentence Embedding, TagSpace and it uses word based embedding)
So 0.82 accuracy with character based model seems worthiness.