BlinkDL/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
PythonApache-2.0
Issues
- 0
How to understand u vector in the origin paper?
#244 opened by 141forever - 0
Zero-division error when args.n_layer = 1, caused by ratio_0_to_1. Can I set ratio_0_to_1 = 0 when n_layer = 1?
#243 opened by zdxdsw - 0
RWKV替换模型中的RNN
#242 opened by hulucky1102 - 2
Probable mistake in Eq. 16 in the preprint
#238 opened by zeyun-zhong - 0
bug in new wkv6state_cuda
#241 opened by SmerkyG - 2
Flash Attention
#239 opened by fakerybakery - 0
The /v1/embeddings interface of rwkv is inconsistent with the /v1/embeddings interface of openai. How should they be compatible?
#240 opened by qq378488249 - 1
How does the generation speed of RWKV-5/6 compare to that of mamba with the same number of parameters?
#236 opened by h-zhao1997 - 1
Can RWKV beat Flash Attention?
#235 opened by yxchng - 1
NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
#237 opened by ZetangForward - 1
Got ImportError when using load() to load wkv_cuda
#204 opened by nanjunye - 1
Tokenizer for fine tuning RWKV-v5 world model
#230 opened by mathewchris96 - 1
How to understand `no` variable in cuda code?
#234 opened by yxchng - 1
how to train For long context
#233 opened by EasonXiao-888 - 1
- 1
- 3
- 1
KeyError: "attribute 'weight' already exists"
#229 opened by ByUnal - 1
fintune RWKV5-7B Missing key(s) in state_dict:
#228 opened by liuao743 - 1
Can RWKV-v4 handle summarization tasks?
#227 opened by zzczzc20 - 1
能否提供huggingface 上的全部RWKV v5模型的微调参数?
#226 opened by lantudou - 1
Finetuning RWKV-5-World-1B5-v2 model
#225 opened by ArchanaNarayanan843 - 1
Truncation in Tokenizer?
#224 opened by sedrick-keh-tri - 2
RWKV for Text to Speech use case
#222 opened by rishikksh20 - 1
RWKV-5 World on colab
#223 opened by EnricoBeltramo - 2
- 2
demo-training-prepare libcudart woes
#217 opened by micsthepick - 1
- 8
AssertionError while finetuning RWKVv5
#216 opened by Ethan-Chen-plus - 1
微调IndexError: list index out of range
#215 opened by aolerv - 3
how to pretrain v5 other lang?
#210 opened by HaloKim - 1
MoE support
#212 opened by James4Ever0 - 1
训练RWKV-4,报错
#211 opened by fuxuelinwudi - 3
请教一下,训练RWKV-4-Pile-3B-20221008-8023,提示错误
#209 opened by XxSuper - 3
- 4
可以给个requirements?
#202 opened by fuxuelinwudi - 3
- 1
v5 train error
#208 opened by HaloKim - 2
v5 train error
#206 opened by HaloKim - 1
如何将rwkv或者retnet用于ocr任务?
#205 opened by chaodreaming - 1
huggingface无法使用
#203 opened by x19990416 - 1
出错 No such file or directory: 'cuda/wkv_op.cpp'
#201 opened by humanpp - 1
想问一下能否提供一个CHN+JPNTuned的7B版本
#199 opened by pipixia244 - 1
如何训练rwkv-5-0.1b,显示权重加载错误
#197 opened by enbiwudi - 3
lora训练时出错
#196 opened by surviveMiao - 3
运行报错
#195 opened by surviveMiao - 1
- 1
Training on Cuda version 11.2, 11.3
#194 opened by cuongnguyengit - 1
Gratitude and Inquiries
#192 opened by 997172286 - 1
为什么生成的内容经常会重复
#190 opened by bigcat26