About Runtime
Harry-mic opened this issue · 7 comments
Hi!Thanks for your awesome work!When I run the commend python main.py --model llama-13b --dataset mnli --attack bertattack --shot 0 --generate_len 20
with one NVIDIA 3090 24G, it takes already 20 hours and I don't know when it would finish. The logs is like:
2023-07-26 08:31:41,068 - INFO - gt: 2
2023-07-26 08:31:41,068 - INFO - Pred: -1
2023-07-26 08:31:41,068 - INFO - sentence: Assess the connection between the following sentences and count it as 'entailment', 'neutral', or 'contradiction':Premise: He hadn't seen even pictures of such things since the few silent movies run in some of the little art theaters. Hypothesis: He had recently seen pictures depicting those things. Answer:
but the results file only contain a file named mnli which is also empty. Would you please give me a general time consuming estimation with your machines? Thanks a lot!
Hi,
Thank you for your attention!
Attacks such as BertAttack and TextFooler can be extremely time-consuming. I'd recommend starting with the StressTest attack, which may take less than 30 minutes, to verify correctness. For reference, running Vicuna-13B may take over 5 days, possibly even a week, on 2 V100s (16GB).
Thanks for your reply! I use the StressTest Attack as you recommend. However, it takes already one day rather than half an hour, and seems to continue generating. My machine is a NVIDIA 3090 24G, I use the llama-13b (download from hugging face), and I run the command python main.py --model llama-13b --dataset mnli --attack stresstest --shot 0 --generate_len 20
without modifying the source code. I wonder where I can improve to rapidly generate a complete result?Thanks again.
Hi,
The running time issue may be due to two reasons:
- GPU Memory: llama-13B requires at least 26GB GPU memory, but 3090 only has 24GB, causing CPU loading and slowing down inference. Can you load llama-7b instead and have a try?
- generate_len: For MNLI, a generate_len of around 5 works well. You can try generate_len=3 first.
Hope this helps!
Thanks for your help! It works and I successfully get the result.
By the way, I wonder if OPT model is also supported in your source code? Thanks!
@Harry-mic I think you can use any model within this codebase. This is the point of open-source, right? But our experience is that OPT family is often bad at language understanding. You can have a try.
Cause I notice you implement the opt-66b in the optional model set, but it was commented out. That's why I put forward this issue.
This benchmark supports all kinds of open-source and private LLMs. You only need to call their apis in the code. We comment out OPT-66b since it is too large for this study. We then use BLOOM, which has an API in huggingface that we can call. Note again, adversarial attack is expensive and model size matters.