Use Fabrice Bellard's ts_server instead if you are looking for fast CPU inference
AXKuhta/rwkv-onnx-dml
Run ONNX RWKV-v4 models with GPU acceleration using DirectML [Windows], or just on CPU [Windows AND Linux]; Limited to 430M model at this time because of .onnx 2GB file size limitation
C++