Issues
- 0
- 0
Parallel inference is thought to be faster than recurrent inference, but it turns out that it is not in play file
#39 opened by wac81 - 0
contains the question when inference
#38 opened by wac81 - 2
can't train 3B model in 48GB single card
#33 opened by wac81 - 0
Integration with transformers library
#37 opened by kiucho - 2
HuggingFace checkpoint
#36 opened by xtwigs - 2
Can you support streaming when generating?
#32 opened by wac81 - 1
how to load model with device_map="auto"
#35 opened by wac81 - 1
- 7
Initialize word embedding layer
#31 opened by hyunwoongko - 1
Added description for torch.compile
#29 opened by ce-lery - 5
Changelog of official implementation
#10 opened by donglixp - 5
Info/Documentation on chunkwise training
#30 opened by pkpro - 1
- 4
Would it be possible to integrate an attention sink https://arxiv.org/pdf/2309.17453.pdf into RetNet?
#27 opened by pkpro - 1
Tokenizer Choice?
#26 opened by risedangel - 10
encountered nan while trying to train
#6 opened by liujuncn - 2
Add Hidden Size for DeepSpeed integration
#23 opened by infosechoudini - 5
- 4
真缺一个全量多卡显存叠加并行训练方案,如果能行也算是一种成功!/ There is really a lack of a full-scale multi-card video memory superposition parallel training scheme. If it can be done, it can be regarded as a success!
#5 opened by gg22mm - 2
- 3
Question about verifying the Inference Latency
#8 opened by LiZeng001 - 4
Comments on the model
#14 opened by okpatil4u - 1
Can't Resume Training from Checkpoint
#17 opened by infosechoudini - 2
- 1
How to load my own model
#12 opened by zhihui-shao - 3
- 2
Can you provide a LICENCE file
#13 opened by Shubhankar-Aidetic - 1
Training using HF Transformers
#3 opened by nebulatgs - 2
Errors when running your examples
#4 opened by houghtonweihu - 1
somewhere that needs to be modified
#1 opened by liujuncn