README Usage: ./train.py --output_dir=out --num_train_epochs=1 --gradient_checkpointing=True --per_device_train_batch_size=1 This should show that training the second adapter does not reduce its loss.