This project enable rwkv model running on windows with C++(CPU/GPU).You can run your own rwkv model service without any python dependence(just click a exe file). It provides following features:
- support c tokenizer
- support libtorch and onnxruntime inference
- support server api by chttplib
- provide model convert script to convert rwkv checkpoint to torchscript/onnx file
- provide client and server release file to use from scratch
- Visual Studio 2022
- cmake(version>=3.0)
- cargo
git clone --recursive https://github.com/ZeldaHuang/rwkv-cpp-server.git
cd rwkv-cpp-server
Download libtorch with curl -O https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.0.0%2Bcpu.zip
and unzip it to source folder.
Download onnxruntime and unzip it to source folder.
Run build.bat
.Release dir path is build/release
,it contains the rwkv-server.exe
and all dependence.
Download rwkv model from huggingface, then convert .pth
model to torchscript/onnx.
python convert/to_onnx.py
python convert/to_torchscript.py
Place the torchscript/onnx model in release/assets/models
. By default the first .pt
or .onnx
file in this dir will be loaded.
Execute rwkv-server.exe
in release
file with rwkv-server.exe ${model_path} ${ip} ${port}
, you can test the service with test.py
or open the client app to chat.