BlinkDL/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
PythonApache-2.0
Issues
- 1
基于world-chinese的1.5b的ckpt增量训练loss绝对值达到2.7几?
#184 opened - 1
- 2
- 0
- 2
- 1
Stuck in Multigpus lora finetuning
#170 opened - 3
Loading extension module wkv_512... Fail
#169 opened - 1
DDP多机多卡如何使用?
#168 opened - 1
- 2
nvcc fatal : Unknown option '-Xptxas -O3'
#166 opened - 0
模型微调
#163 opened - 1
- 0
比清华牛逼
#161 opened - 0
希望新增对接口的ipv6支持(发错了忽略,sry)
#160 opened - 3
transformers代码加载cuda失败
#159 opened - 1
windows平台下vscode运行RWKV-V4 train.py 报错 RuntimeError: Ninja is required to load C++ extensions
#158 opened - 1
- 1
Bfloat16 in v4neo
#155 opened - 1
- 3
- 1
Pretrain using the SlimPajama dataset
#152 opened - 0
precomputed pile binidx dataset
#151 opened - 1
question about time_decay initialization
#150 opened - 2
报错:assert fragment_start < fragment_end
#149 opened - 4
训练到这一步报错 build.ninja...
#148 opened - 2
训练运行train.py报错:
#147 opened - 3
其它的都非常好,现在有个问题就是多卡怎么并连成单卡
#146 opened - 1
- 1
- 1
怎样使用lora+alpaca的代码式样训练rwkv的指令微调?
#143 opened - 3
Lora微调灾难性遗忘
#141 opened - 1
Exporting RWKV into ONNX
#140 opened - 2
- 1
Fewer Checkpoint Files for train.py
#138 opened - 1
Initializing single layer
#137 opened - 1
模型训练问题
#136 opened - 1
Multi-Modal in the future?
#135 opened - 1
- 1
question about the RWKV version
#133 opened - 0
Add citation format to the RWKV preprint
#130 opened - 1
- 3
txt数据集格式
#128 opened - 3
finetune for other languages?
#127 opened - 1
- 1
Visual RWKV
#125 opened - 1
169M模型在下游任务微调时效果不佳
#124 opened - 1
Add to guidance https://github.com/microsoft/guidance/tree/main/guidance/llms/transformers
#123 opened - 2
模型的License 是什么?
#122 opened - 3
Main differences between versions?
#120 opened - 1
UTF-16 stream does not start with BOM
#119 opened