RetNet: A Successor to Transformer for Large Language Models

RetNet is a cutting-edge architecture designed as a foundation for large language models. It offers significant advantages in terms of training parallelism, low-cost inference, and overall performance. This repository contains the implementation of RetNet, which aims to be a strong successor to the Transformer model. We also implement the xPos relative positional encoding since it was used in RetNet.

Installation

Make sure you have the necessary dependencies installed. Refer to the requirements.txt file for specific versions.

Tests

To run unit tests, please run:

python -m src.unittests

Before, pushing any changes, please run the tests.

Community Support

We value community involvement and welcome your support for this project:

Issues: Report any bugs or suggest improvements by opening an issue on GitHub.
Feature Requests: Share your ideas for additional features through GitHub discussions.
Pull Requests: Contribute directly to the codebase by submitting a pull request aligned with the project's goals.
Spread the Word: Help us reach a broader audience by sharing this project on social media and with colleagues and friends.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Note

The original source code for RetNet is not yet released. This repository contains an independent implementation based on the paper.

License

MIT

Thank you for exploring RetNet! We hope this architecture will significantly advance the field of language models. If you have any questions or need assistance, please feel free to reach out through GitHub discussions or by opening an issue. Happy coding!