dvlab-research/LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

PythonApache-2.0

Issues

not able to reproduce the passkey retrieval accuracy
#195 opened 3 months ago by zhuconv
4
LongBench evaluation
#194 opened 4 months ago by Clement25
0
是否支持如GPT2这类的supervised fine-tune？
#193 opened 4 months ago by CharRic
0
How LongAlpaca Data was constructed?
#192 opened 5 months ago by S1s-Z
0
推理时候显存分配
#163 opened a year ago by xxcoco763
2
这套代码是否支持qwen/baichuan微调一个中文的长文本模型，代码需要做哪些修改？
#191 opened 5 months ago by jy-101361-1810897
0
Something wrong with the torch version
#185 opened 7 months ago by dian1414
0
norm层不是没有参数矩阵吗
#190 opened 7 months ago by changanxunyi
0
I am unable to reproduce the results from the paper for llama-7B-32k-longlora ppl.
#188 opened 7 months ago by masteryqq
1
模型完全没法正常输出
#187 opened 7 months ago by Tangent-90C
1
embedding 为什么要resize成32001？
#186 opened 7 months ago by momandai
0
What's the trainset is used to obtain “Model with contextg extension via improved LoRA fine-tuning” (LoRA+)？
#184 opened 8 months ago by ZackZikaiXiao
0
How did make questions and answers for long context(LongAlpaca)?
#183 opened 10 months ago by ddoyles
0
When I set `per_device_train_batch_size=2`, the S2-Attn would not shift as expected
#182 opened 10 months ago by linhaojia13
2
HF models missing rope scaling in the config
#181 opened 10 months ago by hsiehjackson
0
Machine don't install Flash Attention
#180 opened 10 months ago by huilong-chen
0
global_step文件
#179 opened 10 months ago by xxcoco763
0
Regarding the results in Table 8 and Table 14
#177 opened a year ago by Statisticss
0
About the different datasets and corresponding models
#176 opened a year ago by Statisticss
0
The proof-pile/test-sample-ids is not the exact ids for the proof-pile-testsample.bin
#175 opened a year ago by pangjh3
0
Memory usage "too small" for 7B Llama-2
#174 opened a year ago by Linohong
0
training a LLM w/ shifted sparse attention from the scratch?
#173 opened a year ago by we1k
0
merge_lora_weights_and_save_hf_model.py Error while deserializing header: HeaderTooLarge
#172 opened a year ago by Spongeorge
0
Distributed inference issue
#171 opened a year ago by yixliu1
0
LongLoRA + Flash Attention 2 causing illigal memory access
#148 opened a year ago by ArturNiederfahrenhorst
7
Is it possible to increase the context length of phi-2 using LongLora? If yes, what changes need to be done to support it?
#169 opened a year ago by dbanka
1
论文中的evaluate结果，推理时用的attention是shifted sparse attention？还是full attention？
#170 opened a year ago by zhangxiann
0
the value of loss is too unstable when supervised-finetune the 7b-100k-ft model
#168 opened a year ago by seanxuu
1
streaming llm problem
#167 opened a year ago by seanxuu
0
How can I use the Llama-2-7b-longlora-100k-ft model correctly
#166 opened a year ago by seanxuu
0
bug report : RuntimeError: probability tensor contains either inf, nan or element < 0
#165 opened a year ago by seanxuu
0
Is LongLoRA can be mixed with YaRN ?
#164 opened a year ago by DevNullx64
0
Adapting to new models
#162 opened a year ago by epinnock
2
如何在LoRA训练中加入embed和norm层的训练？
#161 opened a year ago by Zheng-Jay
0
Lora+deepspeed zero3 无法保存lora权重问题
#160 opened a year ago by AresXD
6
What llama attn replacement to use for SFT-based inference?
#159 opened a year ago by spring1915
0
在没有报错的情况下，LongAlpaca-7B只对文本的第一段文字进行了响应
#158 opened a year ago by waleyW
0
Configs in inference.py necessary for context length expansion in model serving?
#157 opened a year ago by spring1915
0
训练的时候使用的什么外推方式
#156 opened a year ago by IT-five
0
支持qwen、baichuan等中文模型微调吗
#155 opened a year ago by kevinuserdd
0
torch.cuda.OutOfMemoryError: CUDA out of memory
#146 opened a year ago by zhanglv0209
3
inference OOM
#154 opened a year ago by PharMolix
0
推理 group整除问题
#149 opened a year ago by Michelleable
1
Is LongAlpaca model fine-tuned from llama-2 or the Alpaca model?
#153 opened a year ago by Mooler0410
0
Can LongLoRA be used for incremental pre-training?
#152 opened a year ago by Zheng-Jay
0
the current text generation call will exceed the model's predefined maximum length (4096)
#151 opened a year ago by waleyW
4
中文领域进展
#145 opened a year ago by ccp123456789
1
32k inference result is garbled
#147 opened a year ago by zhanglv0209
8
微调数据
#150 opened a year ago by Go4miii
0
扩充词表后，不改变其他代码和参数，预训练过程中能否对新添加的词元进行训练
#143 opened a year ago by THUchenzhou
0