sherdencooper/GPTFuzz

use error

zky001 opened this issue · 1 comments

image The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (1792). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine

Hi, thanks for running our codes, it looks like you are encountering an issue with vllm. You could refer to vllm-project/vllm#2418 to try the solution mentioned there. Since the vllm running may depend on cuda version and torch version, I cannot determine the solution for your case. If you still encounter issues with vllm, you may turn to hugging face inference instead.