BlinkDL/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
PythonApache-2.0
Issues
- 1
wkv的操作为什么要这么设计呀?
#269 opened by NanakiC - 1
FP32/FP16精度训练
#271 opened by KompressorSC - 1
MQAR评估的问题.
#272 opened by necrophagists - 2
RWKV支持对prefill过程中的统一前缀进行cache的操作吗
#270 opened by Lier007 - 2
- 1
Where's the cuda backward function for v7?
#263 opened by bmilde - 1
How to call the fine-tuned model like using an API?
#267 opened by jieli9626 - 0
With rwkv-V4, If I wish to make an encoder decoder model for example to be used in translation, what are the hidden states that needs passing between the encoder and the decoder? Can you provide some guideline on this matter or any existing work?
#268 opened by shamilajeewantha - 3
对话数据怎么设置不对别人说的话训练?
#264 opened by petergaoshan - 5
关于RUN_CUDA_RWKV6这部分,最好用pytorch实现,否则不方便移植
#252 opened by bobo-wmdigit - 2
RWKV .pth to.onnx
#260 opened by momocoQAQ - 2
使用rwkv_v6_demo中的init_params报错
#262 opened by KompressorSC - 3
- 1
rwkv在rag任务上效果怎么样
#261 opened by ZTurboX - 2
- 1
论文公式写错了
#259 opened by KompressorSC - 1
XXX is currently not supported in Torchscript: 我不知道如何解决这个问题 there is something wrong with cuda in my device
#258 opened by LeC-Z - 3
- 1
Please add rocm support
#247 opened by Wintoplay - 3
RWKV only show lower GPU memory occupancy when inference?
#250 opened by thucz - 2
跑rwkv_v6_demo.py报错
#256 opened by supercyt - 3
请问100多种语言支持是哪100种,有评测过哪些语言的翻译效果是实际可用的吗?
#255 opened by i18nsite - 2
Please tell me how to solve the error reported during the use of rwkv ”CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`“
#253 opened by songjie1121 - 2
lightning_fabric.utilities.exceptions.MisconfigurationException: Unknown configuration for model optimizers.
#251 opened by blueridanus - 4
如何使用state tuning rwkv6-7B?
#246 opened by xinyinan9527 - 1
The device of model.w["emb.weight"] is in CPU
#249 opened by MarshtompCS - 1
RWKV替换模型中的RNN
#242 opened by hulucky1102 - 1
Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1?
#243 opened by zdxdsw - 1
How to understand u vector in the origin paper?
#244 opened by 141forever - 1
RuntimeError: invalid unordered_map<K, T> key
#248 opened by Lixuanhe - 2
Probable mistake in Eq. 16 in the preprint
#238 opened by zeyun-zhong - 0
bug in new wkv6state_cuda
#241 opened by SmerkyG - 2
Flash Attention
#239 opened by fakerybakery - 0
The /v1/embeddings interface of rwkv is inconsistent with the /v1/embeddings interface of openai. How should they be compatible?
#240 opened by qq378488249 - 1
How does the generation speed of RWKV-5/6 compare to that of mamba with the same number of parameters?
#236 opened by h-zhao1997 - 1
Can RWKV beat Flash Attention?
#235 opened by yxchng - 1
NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
#237 opened by ZetangForward - 1
Tokenizer for fine tuning RWKV-v5 world model
#230 opened by mathewchris96 - 1
How to understand `no` variable in cuda code?
#234 opened by yxchng - 1
how to train For long context
#233 opened by EasonXiao-888 - 1
- 1
KeyError: "attribute 'weight' already exists"
#229 opened by ByUnal - 1
fintune RWKV5-7B Missing key(s) in state_dict:
#228 opened by liuao743 - 1
Can RWKV-v4 handle summarization tasks?
#227 opened by zzczzc20 - 1
能否提供huggingface 上的全部RWKV v5模型的微调参数?
#226 opened by lantudou - 1
Finetuning RWKV-5-World-1B5-v2 model
#225 opened by ArchanaNarayanan843 - 1
Truncation in Tokenizer?
#224 opened by sedrick-keh-tri - 2
RWKV for Text to Speech use case
#222 opened by rishikksh20 - 1
RWKV-5 World on colab
#223 opened by EnricoBeltramo - 1