dvlab-research/LongLoRA
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
PythonApache-2.0
Issues
- 4
not able to reproduce the passkey retrieval accuracy
#195 opened by zhuconv - 0
LongBench evaluation
#194 opened by Clement25 - 0
是否支持如GPT2这类的supervised fine-tune?
#193 opened by CharRic - 0
How LongAlpaca Data was constructed?
#192 opened by S1s-Z - 2
- 0
- 0
Something wrong with the torch version
#185 opened by dian1414 - 0
norm层不是没有参数矩阵吗
#190 opened by changanxunyi - 1
I am unable to reproduce the results from the paper for llama-7B-32k-longlora ppl.
#188 opened by masteryqq - 1
模型完全没法正常输出
#187 opened by Tangent-90C - 0
embedding 为什么要resize成32001?
#186 opened by momandai - 0
What's the trainset is used to obtain “Model with contextg extension via improved LoRA fine-tuning” (LoRA+)?
#184 opened by ZackZikaiXiao - 0
- 2
When I set `per_device_train_batch_size=2`, the S2-Attn would not shift as expected
#182 opened by linhaojia13 - 0
HF models missing rope scaling in the config
#181 opened by hsiehjackson - 0
Machine don't install Flash Attention
#180 opened by huilong-chen - 0
global_step文件
#179 opened by xxcoco763 - 0
Regarding the results in Table 8 and Table 14
#177 opened by Statisticss - 0
- 0
The proof-pile/test-sample-ids is not the exact ids for the proof-pile-testsample.bin
#175 opened by pangjh3 - 0
Memory usage "too small" for 7B Llama-2
#174 opened by Linohong - 0
- 0
merge_lora_weights_and_save_hf_model.py Error while deserializing header: HeaderTooLarge
#172 opened by Spongeorge - 0
Distributed inference issue
#171 opened by yixliu1 - 7
- 1
Is it possible to increase the context length of phi-2 using LongLora? If yes, what changes need to be done to support it?
#169 opened by dbanka - 0
- 1
the value of loss is too unstable when supervised-finetune the 7b-100k-ft model
#168 opened by seanxuu - 0
streaming llm problem
#167 opened by seanxuu - 0
- 0
bug report : RuntimeError: probability tensor contains either inf, nan or element < 0
#165 opened by seanxuu - 0
Is LongLoRA can be mixed with YaRN ?
#164 opened by DevNullx64 - 2
Adapting to new models
#162 opened by epinnock - 0
如何在LoRA训练中加入embed和norm层的训练?
#161 opened by Zheng-Jay - 6
Lora+deepspeed zero3 无法保存lora权重问题
#160 opened by AresXD - 0
- 0
在没有报错的情况下,LongAlpaca-7B只对文本的第一段文字进行了响应
#158 opened by waleyW - 0
Configs in inference.py necessary for context length expansion in model serving?
#157 opened by spring1915 - 0
训练的时候使用的什么外推方式
#156 opened by IT-five - 0
支持qwen、baichuan等中文模型微调吗
#155 opened by kevinuserdd - 3
torch.cuda.OutOfMemoryError: CUDA out of memory
#146 opened by zhanglv0209 - 0
inference OOM
#154 opened by PharMolix - 1
推理 group整除问题
#149 opened by Michelleable - 0
- 0
Can LongLoRA be used for incremental pre-training?
#152 opened by Zheng-Jay - 4
the current text generation call will exceed the model's predefined maximum length (4096)
#151 opened by waleyW - 1
中文领域进展
#145 opened by ccp123456789 - 8
32k inference result is garbled
#147 opened by zhanglv0209 - 0
- 0
扩充词表后,不改变其他代码和参数,预训练过程中能否对新添加的词元进行训练
#143 opened by THUchenzhou