Routing by agreement with Transformer-based for NMT
aimanmutasem opened this issue · 2 comments
Hello all :)
I’m trying to use Routing by agreement with TRANSFORMER-BASED for NMT task. The proposed idea is to use each output of head attention as an input capsule for a capsule network to fuse the semantic and spatial information from different heads to help boost the correction of sentence output. As below:
The implementation code is here, and Pytorch issue is here.
I have got so bad results. Kindly, I need and suggestion to work on.
I look forward to your feedback.
I just came here by random walk on Google.
The idea you proposed seems highly similar to this NAACL paper:
Li, J., Yang, B., Dou, Z.Y., Wang, X., Lyu, M.R. and Tu, Z., 2019, June. Information Aggregation for Multi-Head Attention with Routing-by-Agreement. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 3566-3575).
If you're interested, I also have an NMT paper using capsules and a novel variant of dynamic routing to model better translation context.
Zheng, Z., Huang, S., Tu, Z., DAI, X.Y. and Jiajun, C.H.E.N., 2019, November. Dynamic Past and Future for Neural Machine Translation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 930-940).
Yes sir, I have read both papers.
I have difficulty with the implementation and I'm looking for an example.