/Up-DownFormer

Up-DownFormer: This kind of transformer architecture is mostly a newly decided GNN decided in this work, And I've tested this kind of gene and on normal GNN test and get superior result and thishe whole new transformer architecture on a NLP task and it got comparable result as the formal all self-attention ones with much lower computation

Primary LanguagePythonMIT LicenseMIT

Watchers