RetNet is a cutting-edge architecture designed as a foundation for large language models. It offers significant advantages in terms of training parallelism, low-cost inference, and overall performance. This repository contains the implementation of RetNet, which aims to be a strong successor to the Transformer model. We also implement the xPos relative positional encoding since it was used in RetNet.
Make sure you have the necessary dependencies installed. Refer to the requirements.txt file for specific versions.
To run unit tests, please run:
python -m src.unittests
Before, pushing any changes, please run the tests.
We value community involvement and welcome your support for this project:
- Issues: Report any bugs or suggest improvements by opening an issue on GitHub.
- Feature Requests: Share your ideas for additional features through GitHub discussions.
- Pull Requests: Contribute directly to the codebase by submitting a pull request aligned with the project's goals.
- Spread the Word: Help us reach a broader audience by sharing this project on social media and with colleagues and friends.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
The original source code for RetNet is not yet released. This repository contains an independent implementation based on the paper.
Thank you for exploring RetNet! We hope this architecture will significantly advance the field of language models. If you have any questions or need assistance, please feel free to reach out through GitHub discussions or by opening an issue. Happy coding!