This is a super simple c++/cuda implementation of rwkv with no pytorch/libtorch dependencies.
included is a simple example of how to use in both c++ and python.
- Direct Disk -> Gpu loading ( practically no ram needed )
- Uint8 by default
- Incredibly fast
- No dependencies
- Simple to use
- Simple to build
- Optional Python binding using pytorch tensors as wrappers
- Native tokenizer!
- Windows Support!
- HIP(AMD) GPU support!
- Vulkan(All) Support!
- Distributable programs! (check actions for the prebuilt example apps)
- Godot module
- Optimize .pth converter (currently uses a lot of ram)
- Better uint8 support ( currently only uses Q8_0 algorythm)
- Fully fleshed out demos
- go to the actions tab
- find a green checkmark for your platform
- download the executable
- download or convert a model (downloads here)
- place the model.bin file in the same place as the executable
- run the executable
# in example/storygen
build.sh # Linux/nvidia
build.bat # Windows/nvidia
amd.sh # Linux/Amd
vulkan.sh # Linux/Vulkan(all)
You can find executable at build/release/rwkv[.exe]
Make sure you already installed CUDA Toolkit / HIP development tools / Vulkan development tools
You can download the weights of the model here: https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main
For conversion to a .bin model you can choose between 2 options:
Make sure you have python + torch, tkinter, tqdm and Ninja packages installed.
> cd converter
> python3 convert_model.py
Make sure you have python + torch, tqdm and Ninja packages installed.
> cd converter
> python3 convert_model.py your_downloaded_model.pth
- On Windows, please run the above commands in "x64 Native Tools Command Prompt for VS 2022" terminal.
C++ tokenizer came from this project: https://github.com/gf712/gpt2-cpp/