Authors: Razvan-Gabriel Dumitru, Darius Peteleaza & Mihai Surdeanu
If you wish to contact any of us for any reason, please use the above click-able email links.
The paper was published at the ICML 2024 - Next Generation of Sequence Modeling Architectures Workshop - 26 July 2024
.
The detailed paper can be found at https://arxiv.org/abs/2402.02625. Please cite our work if you use it.
We introduce the concept of multiple temporal perspectives, a novel approach applicable to Recurrent Neural Network (RNN) architectures for enhancing their understanding of sequential data. This method involves maintaining diverse temporal views of previously encountered text, significantly enriching the language models' capacity to interpret context. To show the efficacy of this approach, we incorporate it into the Receptance Weighted Key Value (RWKV) architecture, addressing its inherent challenge of retaining all historical information within a single hidden state. Notably, this improvement is achieved with a minimal increase in the number of parameters --even as little as
The original RWKV model is avaibale at https://github.com/BlinkDL/RWKV-LM.
The training data set is is publicly available at https://huggingface.co/datasets/wikipedia.
We evaluated our model’s performance using the EleutherAI Evaluation Harness.
If you use our work in your research, please cite our paper:
Dumitru, R. G., Peteleaza, D., & Surdeanu, M. (2024). Enhancing Transformer RNNs with Multiple Temporal Perspectives. arXiv preprint arXiv:2402.02625.
@misc{dumitru2024enhancingtransformerrnnsmultiple,
title={Enhancing Transformer RNNs with Multiple Temporal Perspectives},
author={Razvan-Gabriel Dumitru and Darius Peteleaza and Mihai Surdeanu},
year={2024},
eprint={2402.02625},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2402.02625},
}