直接推理出现问题

Question

直接推理出现问题

DQYZHWK opened this issue 6 months ago · 10 comments

bash eval.sh
启动脚本如下:

#!/bin/bash

DIR="VTG-LLM"
MODEL_DIR="/home1/lw/fyy/VTG-LLM/vtgllm.pth"


# TASK='dvc'
# ANNO_DIR='data/VTG-IT/dense_video_caption/Youcook2'
# VIDEO_DIR='data/youcook2/YouCook2_asr_denseCap/youcook2_6fps_224'
# DATASET='youcook'
# SPLIT='val'
# PROMPT_FILE="prompts/${TASK}.txt"
# GT_FILE="${ANNO_DIR}/${SPLIT}.caption_coco_format.json"


TASK='tvg'
ANNO_DIR="/home3/linwang/fyy/TimeChat/data/ours_annotation"
VIDEO_DIR="/home4/jiaxin/linwang/fyy/video/subset/"
DATASET='ours'
SPLIT='test'
PROMPT_FILE="prompts/mr.txt"
GT_FILE="${ANNO_DIR}/${SPLIT}.caption_coco_format.json"

# TASK='vhd'
# ANNO_DIR='data/VTG-IT/video_highlight_detection/QVHighlights'
# VIDEO_DIR='data/qvhighlights/videos/val'
# DATASET='qvhighlights'
# SPLIT='val'
# PROMPT_FILE="prompts/vhd.txt"
# GT_FILE="${ANNO_DIR}/highlight_${SPLIT}_release.jsonl"

NUM_FRAME=96
OUTPUT_DIR='output'
CFG_PATH=""


CUDA_VISIBLE_DEVICES=2 python evaluate.py --anno_path ${ANNO_DIR} --video_path ${VIDEO_DIR} --gpu_id 0 \
--task ${TASK} --dataset ${DATASET} --output_dir ${OUTPUT_DIR} --split ${SPLIT} --num_frames ${NUM_FRAME} --batch_size 1 \
--prompt_file ${PROMPT_FILE} --vtgllm_model_path ${MODEL_DIR} --cfg_path eval_configs/videollama-slot-96.yaml

cd metrics/${TASK}
python eval_${TASK}.py --pred_file "output/ours_predicate.json" --gt_file ${GT_FILE} | tee "output/ours_predicate.txt"
cd ../..

出现错误如下：

没用进行微调，直接拿来推理的，是不是提供的vtgllm.pth只能进行finetune，还是huggingface权重传错了，请求您的帮助？

Answer 1 · 2024-05-27T14:05:34.000Z

您好，这个看起来是torch版本问题。我用torch==2.1.2+cu121是可以的。

…

________________________________ 发件人: WenKang Han ***@***.***> 发送时间: 2024年5月27日 9:00 收件人: gyxxyg/VTG-LLM ***@***.***> 抄送: Subscribed ***@***.***> 主题: [gyxxyg/VTG-LLM] 直接推理出现问题 (Issue #8) bash eval.sh 启动脚本如下: #!/bin/bash DIR="VTG-LLM" MODEL_DIR="/home1/lw/fyy/VTG-LLM/vtgllm.pth" # TASK='dvc' # ANNO_DIR='data/VTG-IT/dense_video_caption/Youcook2' # VIDEO_DIR='data/youcook2/YouCook2_asr_denseCap/youcook2_6fps_224' # DATASET='youcook' # SPLIT='val' # PROMPT_FILE="prompts/${TASK}.txt" # GT_FILE="${ANNO_DIR}/${SPLIT}.caption_coco_format.json" TASK='tvg' ANNO_DIR="/home3/linwang/fyy/TimeChat/data/ours_annotation" VIDEO_DIR="/home4/jiaxin/linwang/fyy/video/subset/" DATASET='ours' SPLIT='test' PROMPT_FILE="prompts/mr.txt" GT_FILE="${ANNO_DIR}/${SPLIT}.caption_coco_format.json" # TASK='vhd' # ANNO_DIR='data/VTG-IT/video_highlight_detection/QVHighlights' # VIDEO_DIR='data/qvhighlights/videos/val' # DATASET='qvhighlights' # SPLIT='val' # PROMPT_FILE="prompts/vhd.txt" # GT_FILE="${ANNO_DIR}/highlight_${SPLIT}_release.jsonl" NUM_FRAME=96 OUTPUT_DIR='output' CFG_PATH="" CUDA_VISIBLE_DEVICES=2 python evaluate.py --anno_path ${ANNO_DIR} --video_path ${VIDEO_DIR} --gpu_id 0 \ --task ${TASK} --dataset ${DATASET} --output_dir ${OUTPUT_DIR} --split ${SPLIT} --num_frames ${NUM_FRAME} --batch_size 1 \ --prompt_file ${PROMPT_FILE} --vtgllm_model_path ${MODEL_DIR} --cfg_path eval_configs/videollama-slot-96.yaml cd metrics/${TASK} python eval_${TASK}.py --pred_file "output/ours_predicate.json" --gt_file ${GT_FILE} | tee "output/ours_predicate.txt" cd ../.. 出现错误如下： image.png (view on web)<https://github.com/gyxxyg/VTG-LLM/assets/76473235/6c77a404-9640-413a-9033-7c93f9fb7f06> 没用进行微调，直接拿来推理的，是不是提供的vtgllm.pth只能进行finetune，还是huggingface权重传错了，请求您的帮助？ ― Reply to this email directly, view it on GitHub<#8>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHBB3MGRZ2JV4QVT3V5XO3DZEM36NAVCNFSM6AAAAABILIJNSKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYTSMRTGYYDMMQ>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2024-05-28T13:44:09.000Z

我的cuda驱动是11.4，设备是A100，目前任然无法加载vtgllm的权重文件。

Answer 3 · 2024-05-28T14:24:18.000Z

可以提供您使用的torch版本吗

Answer 4 · 2024-05-28T15:38:16.000Z

当我切换至torch==2.2.0+cu121，目前在简单的模型加载成功了。但是可能驱动还是11.4，需要更新

Traceback (most recent call last):
File "/home3/linwang/fyy/VTG-LLM/evaluate.py", line 434, in
main(args)
File "/home3/linwang/fyy/VTG-LLM/evaluate.py", line 245, in main
model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/cuda/init.py", line 302, in _lazy_init
torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

我的cuda驱动是11.4，设备是A100，目前任然无法加载vtgllm的权重文件。

Answer 5 · 2024-05-28T16:17:26.000Z

fix!!! 我尝试了torch==2.2.1+cu118，CUDA 驱动保持11.4，解决了问题。
当我切换至torch==2.2.0+cu121，目前在简单的模型加载成功了。但是可能驱动还是11.4，需要更新

Traceback (most recent call last): File "/home3/linwang/fyy/VTG-LLM/evaluate.py", line 434, in main(args) File "/home3/linwang/fyy/VTG-LLM/evaluate.py", line 245, in main model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id)) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1152, in to return self._apply(convert) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 802, in _apply module._apply(fn) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 825, in _apply param_applied = fn(param) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1150, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "/home/jiaxin/miniconda3/envs/FastSAM/lib/python3.9/site-packages/torch/cuda/init.py", line 302, in _lazy_init torch._C._cuda_init() RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

我的cuda驱动是11.4，设备是A100，目前任然无法加载vtgllm的权重文件。

Answer 6 · 2024-07-22T02:22:30.000Z

我的cuda驱动是11.4，设备是A100，目前任然无法加载vtgllm的权重文件。

I met this bug too, how can I fix it?

Answer 7 · 2024-07-22T02:30:56.000Z

Please consider upgrading to a newer version of PyTorch, such as torch==2.1.2+cu121. Additionally, please ensure that the CUDA version is compatible with your device.

Answer 8 · 2024-07-22T02:35:14.000Z

you ckpt is not universal enough. And when I used your requirements-v100.txt file to configure the environment, it caused a lot of conflicts.

Answer 9 · 2024-07-22T02:47:55.000Z

The requirements-v100.txt file is directly exported from our CUDA environments. You may want to try running bash install_requirements-v100.sh, as this typically works in most cases. Also, torch==2.1.2+cu121 is recommended.

P.S. This project is built upon VideoLlama and TimeChat, which utilize older training frameworks. We also find it annoying, and we are currently working on training better models using an improved framework. Stay tuned for updates.

Answer 10 · 2024-07-22T03:21:00.000Z

ok. I have ran the code and get the url for webserver. but the model's answer is irrelevant to the question. Can you provide some examples or demonstrations of Q&A.