Ruisi Cai1, Yuandong Tian2, Zhangyang Wang1, Beidi Chen3,
1University of Texas at Austin, 2Meta AI (FAIR), 3Carnegie Mellon University
python train.py \
--dataset_name togethercomputer/RedPajama-Data-1T-Sample \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--block_size 512 \
--clean_period 8 \
--method conv \
--kernel_size 21 \
--n_convlayer 1 \
--mem_size 512 \
--max_train_steps 1000 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 128 \
--eval_iter 20 \
--eval_interval 50 \
--stream_tokenizer \
--normalizer_init 0.5 \
--memory_lr_scale 1000 \
--norm_lr_scale 5 \
--rope_change \
--checkpointing_steps 100 \
--output_dir output/no_extend/rp_{block_size}_{clean_period}_mem{mem_size}/{method}/ \
--auto_resume
The model checkpoints is coming soon!
If you find this useful, please cite the following paper:
@article{cai2024lococo,
title={LoCoCo: Dropping In Convolutions for Long Context Compression},
author={Cai, Ruisi and Tian, Yuandong and Wang, Zhangyang and Chen, Beidi},
journal={arXiv preprint arXiv:2406.05317},
year={2024}
}