Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6

模型：https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6

通常，多模态大模型微调会使用自定义数据集进行微调。在这里，我们将展示可直接运行的demo。

在开始微调之前，请确保您的环境已准备妥当。

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]

模型推理

CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6

<<< 你好
你好！今天我能为您提供什么帮助？
--------------------------------------------------
<<< clear
<<< <image>描述这张图片
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
这张图片展示了一只小猫的特写，它有着引人注目的外貌。小猫有着大大的、圆圆的、蓝色的眼睛，看起来充满了好奇和天真。它的毛色主要是白色，带有灰色和黑色的条纹，特别是在脸部和耳朵周围，这些地方的条纹更加明显。小猫的耳朵竖立着，尖尖的，内侧是粉红色的。它的胡须又长又白，从脸颊上伸出来。小猫的鼻子是粉红色的，嘴巴微微张开，露出一点粉红色的舌头。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，柔和的光线照亮了小猫的毛发。
--------------------------------------------------
<<< clear
<<< <video>描述这段视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
这段视频展示了一个年幼的孩子坐在床上，专心阅读一本书。孩子戴着深色眼镜，穿着浅蓝色无袖上衣和粉色裤子。床上铺着白色床单，孩子旁边放着一件白色衣物。背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线柔和，氛围平静。视频中没有明显的动作或活动，孩子似乎完全沉浸在阅读中。

图片微调

我们使用 coco-en-mini 数据集进行微调，该数据集的任务是对图片内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary

# 默认会将lora_target_modules设置为llm和resampler所有的linear
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2

如果要使用自定义数据集，只需按以下方式进行指定：

  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}

显存占用：

微调后推理脚本如下：

# 如果要全量测试请设置: `--show_dataset_sample -1`
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

微调后模型对验证集进行推理的示例（时间原因，只训练了300个step）：

视频微调

我们使用 video-chatgpt 数据集进行微调，该数据集的任务是对视频内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/swift/VideoChatGPT

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset video-chatgpt \
  --deepspeed default-zero2

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：

{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}

显存占用：

微调后推理脚本如下：

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

微调后模型对验证集进行推理的示例（时间原因，只训练了50个step）：

官方文档的多图理解和in-context有在swift api里支持吗？

支持多图和多轮的

多图需要使用多个标签即可. 可以查看上面的自定义数据集的格式

需要升级swift到什么版本啊？

还在main分支

单样本视频推理的代码可以提供吗

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')

# 流式（streaming）
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子，可能是一个蹒跚学步的幼儿，坐在床上专心阅读一本书。孩子戴着深色眼镜，穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单，背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线充足，氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，它有着引人注目的面部特征。小猫的毛色主要是白色，带有灰色和黑色的条纹，特别是在眼睛周围和耳朵上。它的眼睛又大又圆，有着蓝色的虹膜，看起来非常好奇或专注。小猫的耳朵竖立着，内耳是粉红色的，与毛色形成对比。小猫的鼻子是粉红色的，有着小小的黑色鼻子，嘴巴微微张开，露出一点粉红色的舌头。小猫的胡须又长又白，从脸颊上伸出来。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，有自然光线，可能来自窗户。
"""

请问官方的few-shot推理方式 swift有支持么?

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')

# 流式（streaming）
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子，可能是一个蹒跚学步的幼儿，坐在床上专心阅读一本书。孩子戴着深色眼镜，穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单，背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线充足，氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，它有着引人注目的面部特征。小猫的毛色主要是白色，带有灰色和黑色的条纹，特别是在眼睛周围和耳朵上。它的眼睛又大又圆，有着蓝色的虹膜，看起来非常好奇或专注。小猫的耳朵竖立着，内耳是粉红色的，与毛色形成对比。小猫的鼻子是粉红色的，有着小小的黑色鼻子，嘴巴微微张开，露出一点粉红色的舌头。小猫的胡须又长又白，从脸颊上伸出来。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，有自然光线，可能来自窗户。
"""

is this included in documentation somewhere...

is this included in documentation somewhere...

Thank you for the excellent suggestions. We will update the document within this week.

使用vllm：

pip install vllm>=0.5.4

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
generation_info = {}
request_list = [{'query': query, 'images': images} for _ in range(100)]  # batch推理的示例
resp_list = inference_vllm(vllm_engine, template, request_list, generation_info=generation_info, use_tqdm=True)
print(f'query: {query}')
print(f'response: {resp_list[0]["response"]}')
print(generation_info)

# 流式（streaming）
generation_info = {}
gen = inference_stream_vllm(vllm_engine, template, request_list, generation_info=generation_info)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
# only show first
for resp_list in gen:
    resp = resp_list[0]
    if resp is None:
        continue
    response = resp['response']
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(generation_info)
"""
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 91.47it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00,  4.48it/s]
query: <image>描述这张图片
response: 这张图片展示了一只小猫咪的特写，可能是美国短毛猫品种，因为其花纹和毛发质地。猫咪有着引人注目的蓝色眼睛，这是其外貌中非常突出的特征。它皮毛上有着独特的黑色条纹，从面颊延伸至头顶，暗示着一种有条纹的花纹图案。它的耳朵小而尖，内侧是粉色的。猫咪的胡须细长而突出，围绕在它的下颌两侧和眼睛周围。猫咪坐着，用一种表达丰富的方式直视着，嘴巴微微张开，露出粉红色的内唇。背景模糊，柔和的光线增强了猫咪的特征。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14734, 'num_samples': 100, 'runtime': 23.53027338697575, 'samples/s': 4.249844375176322, 'tokens/s': 626.1720702384794}
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写，可能是一只幼年猫，在模糊的背景中，集中注意力在猫的表情上。这只猫长着一身白色与黑色条纹相间的毛皮，带有微妙的灰褐色。它的眼睛大而圆，具有高度的反光度，表明它们可能含有异色瞳，即一只眼睛是蓝色的，另一只是绿色的，但这只猫两只眼睛都是绿色的。睫毛清晰可见，增添了一种生动的表情。猫的耳朵竖立着，内部呈粉红色，边缘有浅色的阴影，显示出柔软的毛发。胡须又长又明显，突显了小猫的脸部形状。这个品种的猫看起来是一个常见品种，毛皮图案和眼睛颜色表明它可能是一只虎斑猫。光线柔和，产生一种天鹅绒般的效果，突出了猫绒毛的质感。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14986, 'num_samples': 100, 'runtime': 23.375922130944673, 'samples/s': 4.277906105257837, 'tokens/s': 641.0870089339394}
"""

微调minicpm-v-v2_6-chat出现报错:
File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

微调其他模型是可以的，微调命令如下：

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--dataset **.jsonl
--deepspeed default-zero2 @Jintao-Huang

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

swift deploy 走的是 Async+VLLM的

客户端调用方式可以查看这里的文档：

https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.html#id4

CUDA_VISIBLE_DEVICES=0 swift deploy \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --infer_backend vllm

请教一下，可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。

swift deploy 走的是 Async+VLLM的

客户端调用方式可以查看这里的文档：

https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/vLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E6%96%87%E6%A1%A3.html#id4

这个文档显示的是 openai的客户端调用方法，openai 是同步调用吧？异步调用代码是不是得用 asyncio 包吧？

服务端:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192

客户端：

import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)

query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
    return await asyncio.gather(*tasks)

resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']

async def _stream():
    global query
    request_config = XRequestConfig(seed=42, stream=True)
    stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
    print(f'query: {query}')
    print('response: ', end='')
    async for chunk in stream_resp:
        print(chunk.choices[0].delta.content, end='', flush=True)
    print()

asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
"""

服务端:

CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192

客户端：

import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async

model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)

query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
    return await asyncio.gather(*tasks)

resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')

query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']

async def _stream():
    global query
    request_config = XRequestConfig(seed=42, stream=True)
    stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
    print(f'query: {query}')
    print('response: ', end='')
    async for chunk in stream_resp:
        print(chunk.choices[0].delta.content, end='', flush=True)
    print()

asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.

In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.

The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.

Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
"""

非常感谢你jintao-huang，

请问如何使用 python sdk启动服务呢
如何保障每次异步请求的每次结果都是不一样的呢，因为我看seed 都是一样的
相关其他多模态模型是否也是通用以上代码呢，比如 internvl2

Looking forward ur reply, Thank u!

如何使用 python sdk启动服务

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))

保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

minicpmv2-6 & vllm 开启服务要求安装flash-attn的问题已经修复

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

用 deploy_main sdk同样的 cli 参数会报错：

INFO: 2024-08-12 23:36:53,874 vllm_utils.py:567] generation_config: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.3, top_p=0.7, top_k=20, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=2048, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=False, spaces_between_special_tokens=True, truncate_prompt_tokens=None)
INFO: 2024-08-12 23:36:53,876 vllm_utils.py:578] system: You are a helpful assistant.
INFO:     Started server process [298157]
INFO:     Waiting for application startup.
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
INFO:     Application startup complete.
    self.run()
  File "/opt/conda/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/ossfs/workspace/ms-swift-main/swift/llm/deploy.py", line 70, in <lambda>
INFO:     Uvicorn running on http://127.0.0.1:8000/ (Press CTRL+C to quit)
    thread = Thread(target=lambda: asyncio.run(_log_stats_hook(_args.log_interval)))
  File "/opt/conda/lib/python3.8/site-packages/nest_asyncio.py", line 27, in run
    loop = asyncio.get_event_loop()
  File "/opt/conda/lib/python3.8/asyncio/events.py", line 639, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'Thread-7'.
/opt/conda/lib/python3.8/threading.py:934: RuntimeWarning: coroutine '_log_stats_hook' was never awaited
  self._invoke_excepthook(self)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

如何使用 python sdk启动服务
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments

# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
保障每次异步请求的每次结果都是不一样

seed为None即可（默认）

相关其他多模态模型是否也是通用以上代码

是的

我是否可以使用 get_vllm_engine 的接口方式，启动 vllm 服务呢？和 deploy_main 的方式有什么区别呢？

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_vllm_engine, get_template, inference_vllm, ModelType,
    get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch

model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
                              max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

楼主已经修复值最新 main 分支，pip install -e '.[all]'
区别在于，python sdk get_vllm_engine 开启的服务，不能用异步调用；而 CLI 开启的 vllm 服务默认是 Async 服务，可以异步调用

请教一下 VLLM+异步客户端调用支持官方的Fewshot 功能么？fewshot 功能如下：
来自：https://huggingface.co/openbmb/MiniCPM-V-2_6#in-context-few-shot-learning

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

question = "production date" 
image1 = Image.open('example1.jpg').convert('RGB')
answer1 = "2023.08.04"
image2 = Image.open('example2.jpg').convert('RGB')
answer2 = "2007.04.24"
image_test = Image.open('test.jpg').convert('RGB')

msgs = [
    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
    {'role': 'user', 'content': [image_test, question]}
]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

支持的, 这个就是多轮对话

支持的, 这个就是多轮对话

请问如果是异步调用 vllm，怎么写 fewshot 呢，参数该怎么传呀

V100多卡sft，设置 use_flash_attn false，已经报错需要安装flash-attn, 报错位置：
get_class_from_dynamic_module('modeling_navit_siglip.SiglipVisionTransformer', model_dir)

V100多卡sft，设置 use_flash_attn false，已经报错需要安装flash-attn, 报错位置： get_class_from_dynamic_module('modeling_navit_siglip.SiglipVisionTransformer', model_dir)

我加个except ImportError好了.

V100多卡sft，设置 use_flash_attn false，已经报错需要安装flash-attn, 报错位置： get_class_from_dynamic_module('modeling_navit_siglip.SiglipVisionTransformer', model_dir)

我加个except ImportError好了.

参照 OpenBMB/MiniCPM-o#461 解决了哈，感谢

您好，我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码。发现对视频的测试结果，似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取，结果显示都只会输出第一帧的OCR结果。请问能提供具体的测试代码(.py文件)地址么？我们想check一下数据处理的部分，是否只读取了视频第一帧的信息。

您好，我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码。发现对视频的测试结果，似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取，结果显示都只会输出第一帧的OCR结果。请问能提供具体的测试代码(.py文件)地址么？我们想check一下数据处理的部分，是否只读取了视频第一帧的信息。

https://github.com/modelscope/ms-swift/blob/main/swift/llm/utils/template.py#L2594

拉取一下main分支再试试呢，明天应该会发版本

{"conversations": [
    {"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
    {"from": "assistant", "value": "bbbbb"},
    {"from": "user", "value": "<img>img_path</img>ccccc"},
    {"from": "assistant", "value": "ddddd"}
]},

请问支持 <img>img_path</img> 这样的图片输入吗

{"conversations": [
    {"from": "user", "value": "<img>img_path</img><img>img_path2</img><img>img_path3</img>aaaaa"},
    {"from": "assistant", "value": "bbbbb"},
    {"from": "user", "value": "<img>img_path</img>ccccc"},
    {"from": "assistant", "value": "ddddd"}
]},

请问支持 <img>img_path</img> 这样的图片输入吗

支持的

How to create data .json if I need to do full fine tune of the model(MiniCPM-V-2_6) using videos, can anyone please give me .json format ?

data.jsonl

{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}

How to create data .json if I need to do full fine tune of the model(MiniCPM-V-2_6) using videos, can anyone please give me .json format ?

Thank you for your reply.
Sorry, But not able to understand.
I can able to see this - https://github.com/OpenBMB/MiniCPM-V/blob/main/finetune/readme.md
for multiple images:
json file will be :

[
{
"id": "0",
"image": {
"<image_00>": "path/to/image_0.jpg",
"<image_01>": "path/to/image_1.jpg",
"<image_02>": "path/to/image_2.jpg",
"<image_03>": "path/to/image_3.jpg"
},
"conversations": [
{
"role": "user",
"content": "How to create such text-only videos using CapCut?\n<image_00>\n<image_01>\n<image_02>\n<image_03>\n"
},
{
"role": "assistant",
"content": "To create a text-only video as shown in the images, follow these steps in CapCut..."
}
]
}
]

I have data as videos and the captions for each video.
and I need to train the model - MiniCPM-V-2_6
Do I need make frames out of each video and then this kind of input.json I need to create ??

ms-swift will help with these, so you don't need to worry about them.

data.jsonl

{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}

How to create data .json if I need to do full fine tune of the model(MiniCPM-V-2_6) using videos, can anyone please give me .json format ?

sorry, I copied it incorrectly.

使用swift和直接使用代码中的finetune_ds.py进行sft有什么区别吗？为什么还要额外使用swift这个工具呀

这个模型感觉怪怪的，在训练时，打印的的精度可以到95%左右，但是实际在测试集上只有80%左右，还没有minicpmv2.5效果好，这是怎么回事呢

Facing some issue......
Showing error while running -finetune_ds.sh

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I made the changes in the code - eval.py
def get_model(args):
if args.model_name=='':
raise Exception('Model name cannot be empty str!')
from models.MiniCPM.minicpmv import MiniCPM_V
model_path = args.model_path
ckpt = args.ckpt
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
torch.cuda.set_device(device)
model = MiniCPM_V(model_path=model_path, ckpt=ckpt, device=device)
model = model.to(device)
#torch.cuda.set_device(device)

.........................................................
Still facing above error .........

.....................................................
In have one GPU -

nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02 Driver Version: 555.58.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 |
| 0% 29C P0 61W / 300W | 1671MiB / 23028MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3238 G /usr/lib/xorg/Xorg 123MiB |
| 0 N/A N/A 3397 G /usr/bin/gnome-shell 147MiB |
| 0 N/A N/A 3711 C+G /usr/lib/x86_64-linux-gnu/dcv/dcvagent 846MiB |
| 0 N/A N/A 4420 G ...ures=SpareRendererForSitePerProcess 33MiB |
| 0 N/A N/A 6806 G ...irefox/4336/usr/lib/firefox/firefox 312MiB |

can anyone help on this ? how to fix the error and do full finetune..............

这个模型感觉怪怪的，在训练时，打印的的精度可以到95%左右，但是实际在测试集上只有80%左右，还没有minicpmv2.5效果好，这是怎么回事呢

训练时候的acc 是token级别的

Facing some issue...... Showing error while running -finetune_ds.sh

RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I made the changes in the code - eval.py def get_model(args): if args.model_name=='': raise Exception('Model name cannot be empty str!') from models.MiniCPM.minicpmv import MiniCPM_V model_path = args.model_path ckpt = args.ckpt device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') torch.cuda.set_device(device) model = MiniCPM_V(model_path=model_path, ckpt=ckpt, device=device) model = model.to(device) #torch.cuda.set_device(device)

......................................................... Still facing above error .........

..................................................... In have one GPU -

nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02 Driver Version: 555.58.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 |
| 0% 29C P0 61W / 300W | 1671MiB / 23028MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3238 G /usr/lib/xorg/Xorg 123MiB | | 0 N/A N/A 3397 G /usr/bin/gnome-shell 147MiB | | 0 N/A N/A 3711 C+G /usr/lib/x86_64-linux-gnu/dcv/dcvagent 846MiB | | 0 N/A N/A 4420 G ...ures=SpareRendererForSitePerProcess 33MiB | | 0 N/A N/A 6806 G ...irefox/4336/usr/lib/firefox/firefox 312MiB |

can anyone help on this ? how to fix the error and do full finetune..............

Is this the fine-tuning code of ms-swift?

I am using this code - https://github.com/OpenBMB/MiniCPM-V/blob/main/finetune/finetune_ds.sh

这个模型感觉怪怪的，在训练时，打印的的精度可以到95%左右，但是实际在测试集上只有80%左右，还没有minicpmv2.5效果好，这是怎么回事呢

训练时候的acc 是token级别的

minicpmv2.5也是token级别吧但是实际效果比这个高了10个点

swift/swift/llm/utils/vllm_utils.py

def _add_vllm_request(llm_engine: LLMEngine, inputs: Dict[str, Any], *, request_id: str,
                      generation_config: VllmGenerationConfig, **kwargs) -> None:
    input_ids = inputs['input_ids']
    if version.parse(vllm.__version__) >= version.parse('0.4.3'):
        llm_inputs = {'prompt_token_ids': input_ids}
        images = inputs.get('images') or []
        if images:
            assert len(images) == 1, 'Currently, only one image is supported.'
            llm_inputs['multi_modal_data'] = {'image': images[0]}
        llm_engine.add_request(request_id, llm_inputs, generation_config, **kwargs)
    else:
        llm_engine.add_request(request_id, None, generation_config, input_ids, **kwargs)

vllm只支持一张图片？两张图片报错了，怎么支持多图啊？

FROM - https://github.com/modelscope/ms-swift/tree/main

Single GPU Training

Full-parameter:
Experimental Environment: NVIDIA A10G
GPU Memory Requirement: 64 GB
CUDA_VISIBLE_DEVICES=0 nohup swift sft --sft_type "full" --tuner_backend "swift" --model_id_or_path "OpenBMB/MiniCPM-V-2_6" --template_type "minicpm-v-v2_6" --system "You are a helpful assistant." --dataset --dataset /home/trainging_data/train_data.jsonl --lora_target_modules ^ ( l l m | r e s a m p l e r ) ( ? ! . * ( l m _ h e a d | o u t p u t | e m b | w t e | s h a r e d ) ) . * --learning_rate "1e-05" --gradient_accumulation_steps "16" --eval_steps "500" --save_steps "500" --model_name model_dcg_fine_tune --eval_batch_size "1" --add_output_dir_suffix False --output_dir /home/PycharmProjects/pythonProject/swift/output/minicpm-v-v2_6-chat/v0-20240820-060337 --logging_dir /home/PycharmProjects/pythonProject/swift/output/minicpm-v-v2_6-chat/v0-20240820-060337/runs --ignore_args_error True > /home/PycharmProjects/pythonProject/swift/output/minicpm-v-v2_6-chat/v0-20240820-060337/runs/run.log 2>&1 &

.........>>Its taking long time to do full fine tune more then 7hrs using ms-swift why ?? any reason @Jintao-Huang
Jintao please look at the run command
is there any way we can make it fast ?

A10G

This situation feels normal. How many seconds does each step take?

swift/swift/llm/utils/vllm_utils.py

def _add_vllm_request(llm_engine: LLMEngine, inputs: Dict[str, Any], *, request_id: str,
                      generation_config: VllmGenerationConfig, **kwargs) -> None:
    input_ids = inputs['input_ids']
    if version.parse(vllm.__version__) >= version.parse('0.4.3'):
        llm_inputs = {'prompt_token_ids': input_ids}
        images = inputs.get('images') or []
        if images:
            assert len(images) == 1, 'Currently, only one image is supported.'
            llm_inputs['multi_modal_data'] = {'image': images[0]}
        llm_engine.add_request(request_id, llm_inputs, generation_config, **kwargs)
    else:
        llm_engine.add_request(request_id, None, generation_config, input_ids, **kwargs)

vllm只支持一张图片？两张图片报错了，怎么支持多图啊？

是的，使用pt。vllm应该马上支持多图了

This situation feels normal. How many seconds does each step take?
--> I am not able to see logs or running status in UI as well - do I need to run the command in terminal to see the running status/logs?

run command:
CUDA_VISIBLE_DEVICES=0
swift sft
--model_type minicpm-v-v2_6-chat
--dataset /home/trainging_data/train_data.jsonl
--num_train_epochs 5
--sft_type full
--output_dir output
--eval_steps 50 \

error:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacty of 22.09 GiB of which 9.44 MiB is free. Process 3667 has 861.82 MiB memory in use. Including non-PyTorch memory, this process has 20.70 GiB memory in use. Of the allocated memory 19.94 GiB is allocated by PyTorch, and 446.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Train: 0%|

Do I need more memory ? @yingdachen

This situation feels normal. How many seconds does each step take?
--> I am not able to see logs or running status in UI as well - do I need to run the command in terminal to see the running status/logs?

run command: CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --dataset /home/trainging_data/train_data.jsonl --num_train_epochs 5 --sft_type full --output_dir output --eval_steps 50 \

error: Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacty of 22.09 GiB of which 9.44 MiB is free. Process 3667 has 861.82 MiB memory in use. Including non-PyTorch memory, this process has 20.70 GiB memory in use. Of the allocated memory 19.94 GiB is allocated by PyTorch, and 446.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Train: 0%|

Do I need more memory ? @yingdachen

yes. maybe 100GiB.

我有个疑问，我在charxiv榜单上看到[MiniCPM-V2.6 (Upsize+CoT)]，他后面的Upsize+CoT在微调之后怎么开启啊？

我想问一个问题,我在进行sft时，loss 和 learnig rate从一开始就是0，我的val_dataset设定成随机产生的文字都是如此，想问下大家有没有遇到这样的问题

我想问一个问题,我在进行sft时，loss 和 learnig rate从一开始就是0，我的val_dataset设定成随机产生的文字都是如此，想问下大家有没有遇到这样的问题

试试拉一下main分支，能否解决这个问题.

是V100机器嘛

我想问一个问题,我在进行sft时，loss 和 learnig rate从一开始就是0，我的val_dataset设定成随机产生的文字都是如此，想问下大家有没有遇到这样的问题

试试拉一下main分支，能否解决这个问题.

是V100机器嘛

是的V100机器，swift是刚clone的不知道上面是否有可能的错误

感觉是用了fp16的原因. 你可以试试bf16或者fp32不

Hi @yingdachen ,

I am able to fine tune and got the best model.
Train: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 725/725 [1:25:45<00:00, 7.10s/it]
[INFO:swift] last_model_checkpoint: /home/swift/output/minicpm-v-v2_6-chat/v3-20240821-183331/checkpoint-725
[INFO:swift] best_model_checkpoint: /home/swift/output/minicpm-v-v2_6-chat/v3-20240821-183331/checkpoint-350

But
how to evaluate the model accuracy?
how to test this model using new test data and how to calculate the accuracy in test data? -->Is there any code in ms-swift to create UI

can you please give some input @yingdachen

感觉是用了fp16的原因. 你可以试试bf16或者fp32不

大佬我用的V100，应该不支持bf16，它自动调整成fp16了,不知道有没有V100实现的案例

How to evaluate the model minicpm-v-v2_6-chat/v3-20240821-183331/checkpoint-350 with custom video data ?
@yingdachen please let me know the run command?

its not working for custom video data !

Hi @yingdachen ,

I am able to fine tune and got the best model. Train: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 725/725 [1:25:45<00:00, 7.10s/it] [INFO:swift] last_model_checkpoint: /home/swift/output/minicpm-v-v2_6-chat/v3-20240821-183331/checkpoint-725 [INFO:swift] best_model_checkpoint: /home/swift/output/minicpm-v-v2_6-chat/v3-20240821-183331/checkpoint-350

But how to evaluate the model accuracy? how to test this model using new test data and how to calculate the accuracy in test data? -->Is there any code in ms-swift to create UI

can you please give some input @yingdachen

Typically, we use the last_model instead of the best_model (even though the eval_loss is the smallest for the latter). Please evaluate both models.

Thanks for your input @yingdachen
How to evaluate with custom dataset(test video data) its throwing error ?
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason

How to evaluate with custom dataset(test video data) its throwing error ?
raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.

I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason

@yingdachen, any input ???

How to evaluate with custom dataset(test video data) its throwing error ? raise APIConnectionError(request=request) from err openai.APIConnectionError: Connection error.

I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason

@yingdachen, any input ???

This error indicates insufficient GPU memory.

@yingdachen for which error I am getting two error !!

利用zero3微调MiniCPM-V2.6报错，只是将图片微调的命令从zero2改成了默认的zero3，就出现了报错：
Traceback (most recent call last):
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/cli/sft.py", line 5, in
sft_main()
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/sft.py", line 417, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/trainers/mixin.py", line 552, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2015, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
result = self._prepare_deepspeed(*args)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/utils/template.py", line 337, in _initialize
res = _old_initialize(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/init.py", line 179, in initialize
config_class = DeepSpeedConfig(config, mpu, mesh_device=mesh_device)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 797, in init
self._initialize_params(copy.copy(self._param_dict))
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
self.zero_config = get_zero_config(param_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
return DeepSpeedZeroConfig(**zero_config_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py", line 57, in init
super().init(**data)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/pydantic/main.py", line 193, in init
self.pydantic_validator.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=11560550.4, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/int_from_float
Traceback (most recent call last):
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/cli/sft.py", line 5, in
sft_main()
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/sft.py", line 417, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/trainers/mixin.py", line 552, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2015, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1284, in prepare
result = self._prepare_deepspeed(*args)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 1751, in _prepare_deepspeed
engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/ubuntu/disk2T_1/wzy/MiniCPM-V/swift/swift/llm/utils/template.py", line 337, in _initialize
res = _old_initialize(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/init.py", line 179, in initialize
config_class = DeepSpeedConfig(config, mpu, mesh_device=mesh_device)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 797, in init
self._initialize_params(copy.copy(self._param_dict))
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 817, in _initialize_params
self.zero_config = get_zero_config(param_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/zero/config.py", line 71, in get_zero_config
return DeepSpeedZeroConfig(**zero_config_dict)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/config_utils.py", line 57, in init
super().init(**data)
File "/home/ubuntu/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/pydantic/main.py", line 193, in init
self.pydantic_validator.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for DeepSpeedZeroConfig
stage3_prefetch_bucket_size
Input should be a valid integer, got a number with a fractional part [type=int_from_float, input_value=11560550.4, input_type=float]
For further information visit https://errors.pydantic.dev/2.8/v/int_from_float
请问在swift中微调MiniCPM-V2.6只能使用zero2吗？，我使用的机器是4张3090

deepspeed版本调整一下

deepspeed版本调整一下

您好，由于GPU显存不够所以想尝试用int4的模型取finetune，但是我看swift得官方文档里面没有说支持minicpm的int4模型微调https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B
我直接指定--model_id_or_path OpenBMB/MiniCPM-V-2_6-int4然后在--sft_type lora 后面增加--quantization_bit 4，但是model_type仍然是minicpm-v-v2_6-chat发现也可以训练，请问这样的设置是对的吗，还是说后续您会专门为int4模型增加相应的model_type呢？

偏好数据训练时，格式需要怎样的，其他模型可以用的，训练这个模型报错

请问微调完成后怎么获得用于部署的gguf模型呢

请问微调完成后怎么获得用于部署的gguf模型呢

minicpm转gguf的流程需要一些定制化操作，可以参考minicmp的官方文档：
https://modelbest.feishu.cn/wiki/LZxLwp4Lzi29vXklYLFchwN5nCf

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5
没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5 没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

我也遇到过这个问题，用--deepspeed zero3-offload需要很大的内存，一般卡住不动都是因为机器内存满了导致的

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5 没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

我也遇到过这个问题，用--deepspeed zero3-offload需要很大的内存，一般卡住不动都是因为机器内存满了导致的

deepspeed不能用虚拟内存是吗？
我是进程很快就停止运行了

CUDA_VISIBLE_DEVICES=0 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset /data --deepspeed zero3-offload --output_dir output --num_train_epoch 5 没有报错但是进程直接停止了

[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenBMB/MiniCPM-V-2_6
[WARNING:modelscope] Using branch: master as version is unstable, use with caution
[INFO:swift] Loading the model using model_dir: /home/shy/.cache/modelscope/hub/OpenBMB/MiniCPM-V-2_6
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO:swift] model_kwargs: {'device_map': None}
[2024-09-13 15:40:49,803] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-09-13 15:40:50,825] [INFO] [config.py:733:init] Config mesh_device None world_size = 1
[2024-09-13 15:40:50,826] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-09-13 15:40:50,826] [INFO] [comm.py:667:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-09-13 15:40:51,245] [INFO] [comm.py:717:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.31.119, master_port=29500
[2024-09-13 15:40:51,245] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

我也遇到过这个问题，用--deepspeed zero3-offload需要很大的内存，一般卡住不动都是因为机器内存满了导致的

deepspeed不能用虚拟内存是吗？我是进程很快就停止运行了

可以用虚拟内存，我也是用了的，但是还是会卡住，我是在train出来之后卡住的，后来多加了几根内存条就好了

我们使用
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--dataset coco-en-mini#20000
--deepspeed default-zero2
对模型进行微调后，使用
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, lora_path)
部署模型时，报错：Target module Qwen2ForCauselLM() is not supported. currently, only the followingrmodules are supported: 'torch.nn.Linear'……。请问该如何解决那？

请教一下，用swift在进行视频微调的时候是不是也是通过抽帧实现的？这个抽帧率默认用的多少，能修改吗？在命令行里没找到这个参数，试着用了下sample_n_frames.显示不支持。因为现在用图片和视频微调，就得算下图片的配比，需要知道抽帧率，谢谢。

I Successfully finetuned OpenBMB/MiniCPM-V-2_6 model using custom dataset,

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--sft_type lora
--deepspeed default-zero2
--dataset train.jsonl
--val_dataset val.jsonl

An my checkpoint is also created....

I used the below format for both dataset and val_dataset:
{"query": "55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]], "images": ["image_path"]}

I want to infer the fine tuned model by passing my own prompt and image url as an input,
i did

Can someone help me with,

How to use and infer the fine tuned model (checkpoint-xxx-merged) using my input prompt and image url.
How to deploy the fine tuned model using vllm or lmdeploy
I am not understanding the evaluation part, can you please share me the evaluation data set format.
How to use eval dataset and test the accuracy of the model.

Some one please help me...

我们使用 CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --dataset coco-en-mini#20000 --deepspeed default-zero2 对模型进行微调后，使用 model = AutoModel.from_pretrained(model_path, trust_remote_code=True) model = PeftModel.from_pretrained(model, lora_path) 部署模型时，报错：Target module Qwen2ForCauselLM() is not supported. currently, only the followingrmodules are supported: 'torch.nn.Linear'……。请问该如何解决那？

升级一下ms-swift

请教一下，用swift在进行视频微调的时候是不是也是通过抽帧实现的？这个抽帧率默认用的多少，能修改吗？在命令行里没找到这个参数，试着用了下sample_n_frames.显示不支持。因为现在用图片和视频微调，就得算下图片的配比，需要知道抽帧率，谢谢。

ms-swift/swift/llm/utils/template.py

Line 3346 in 6330c70

    
           load_video = partial(load_video_minicpmv_mplug_owl3, max_num_frames=max_num_frames)

MAX_NUM_FRAMES

I Successfully finetuned OpenBMB/MiniCPM-V-2_6 model using custom dataset,

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --sft_type lora --deepspeed default-zero2 --dataset train.jsonl --val_dataset val.jsonl

An my checkpoint is also created....

I used the below format for both dataset and val_dataset: {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]], "images": ["image_path"]}

I want to infer the fine tuned model by passing my own prompt and image url as an input, i did

Can someone help me with,

How to use and infer the fine tuned model (checkpoint-xxx-merged) using my input prompt and image url.

How to deploy the fine tuned model using vllm or lmdeploy

I am not understanding the evaluation part, can you please share me the evaluation data set format.

How to use eval dataset and test the accuracy of the model.

Some one please help me...

https://swift.readthedocs.io/en/latest/Multi-Modal/index.html

Hi I finetuned MiniCPM-V 2.6 model using #1613.

And deployed the merged model using CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx-merged

when trying to call the post api, it is not responding

INFO: 2024-09-30 08:25:57,729 deploy.py:157] {'request_id': 'chatcmpl-f515986bf3d24c9e9b66f6a83d48a0eb', 'model': 'minicpm-v-v2_6-chat', 'messages': [{'role': 'user', 'content': 'Describe this image.'}], 'generation_config': GenerationConfig({'bos_token_id': 151643, 'eos_token_id': 151645, 'max_new_tokens': 32410, 'pad_token_id': 151643, 'return_dict_in_generate': True}), 'seed': None, 'stop': [], 'stream': False}
Starting from v4.46, the logits model output will have the same type as the model (except at train time, where it will always be FP32)
INFO: 2024-09-30 08:26:04,559 deploy.py:56] {'num_prompt_tokens': 0, 'num_generated_tokens': 0, 'num_samples': 0, 'runtime': 10.00989026, 'samples/s': 0.0, 'tokens/s': 0.0}

I can see the hit is made in the terminal logs but no response can be found on postman

What is the format of eval dataset.

How to validate the eval dataset and what is the meaning of label key in result dataset.

Should the response key in the eval data should be empty?

What is the format of eval dataset.

How to validate the eval dataset and what is the meaning of label key in result dataset.

Should the response key in the eval data should be empty?

Same issue here. My inference dataset is the same foramt with my finetuning dataset, but the inference cli doesn't work

A800上使用
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft
--model_type minicpm-v-v2_6-chat
--model_id_or_path $MODEL
--sft_type lora
--dataset coco-en-mini#20000
--deepspeed default-zero2

会直接卡死

eval 数据集的格式是什么。

如何验证 eval 数据集以及 result dataset 中 label key 的含义是什么。

eval 数据中的响应键应该为空吗？

the same question

如何使用sft后的模型进行推理？
CUDA_VISIBLE_DEVICES=1 swift infer
--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
--ckpt_dir output/minicpm-v-v2_6-chat/v1-20241111-150624/checkpoint-10
是这样设置吗

如何使用sft后的模型进行推理？ CUDA_VISIBLE_DEVICES=1 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path OpenBMB/MiniCPM-V-2_6 --ckpt_dir output/minicpm-v-v2_6-chat/v1-20241111-150624/checkpoint-10 是这样设置吗

参考了https://swift.readthedocs.io/en/latest/Multi-Modal/minicpm-v-best-practice.html，原来是这样设置：
CUDA_VISIBLE_DEVICES=1 swift infer
--ckpt_dir output/minicpm-v-v2_6-chat/v1-20241111-150624/checkpoint-10-merged

@Jintao-Huang
利用8卡4090微调minicpm-v-v2_6-chat-int4模型报CUDA out of memory，int4模型要比正常模型要小很多呀，但是lora的时候显存持续上涨一直到超过24G，这是为什么呢？
命令行：
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 NPROC_PER_NODE=8 swift sft --model_type minicpm-v-v2_6-chat-int4 --model_id_or_path OpenBMB/MiniCPM-V-2_6-int4 --sft_type lora --dataset /home/wzy/disk1/MiniCPM-V/finetune/relation_datasets/standard_train_no_history.jsonl --deepspeed zero2-offload --eval_steps 1000 --lora_dtype fp16

{'loss': 1.6643486, 'acc': 0.54588017, 'grad_norm': 4.12383413, 'learning_rate': 9.978e-05, 'memory(GiB)': 18.58, 'train_speed(iter/s)': 0.49618, 'epoch': 0.08, 'global_step/max_steps': '865/11043', 'percentage': '7.83%', 'elapsed_time': '28m 56s', 'remaining_time': '5h 40m 31s'}
{'loss': 1.60014324, 'acc': 0.59105425, 'grad_norm': 4.04182339, 'learning_rate': 9.977e-05, 'memory(GiB)': 18.58, 'train_speed(iter/s)': 0.49637, 'epoch': 0.08, 'global_step/max_steps': '870/11043', 'percentage': '7.88%', 'elapsed_time': '29m 5s', 'remaining_time': '5h 40m 14s'}
Train: 8%|████████▌ | 874/11043 [29:14<6:30:05, 2.30s/it]Traceback (most recent call last):
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/cli/sft.py", line 5, in
sft_main()
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/llm/sft.py", line 546, in llm_sft
return trainer_train(args, model, template, train_dataset, val_dataset, callbacks=callbacks, msg=msg)
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/llm/sft.py", line 496, in trainer_train
trainer.train(training_args.resume_from_checkpoint)
File "/mnt/sdb/MiniCPM-V/ms-swift/swift/trainers/mixin.py", line 493, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 2203, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/transformers/trainer.py", line 3147, in training_step
self.accelerator.backward(loss)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/accelerator.py", line 2117, in backward
self.deepspeed_engine_wrapped.backward(loss, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 166, in backward
self.engine.backward(loss, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1976, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2051, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/home/wzy/anaconda3/envs/MiniCPM-V/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.96 GiB. GPU 0 has a total capacty of 23.65 GiB of which 7.77 GiB is free. Including non-PyTorch memory, this process has 15.87 GiB memory in use. Of the allocated memory 13.98 GiB is allocated by PyTorch, and 1.29 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

您好，我目前需要进行多图推理，是否方便提供多图的单样本推理的代码。同时我需要进行大量的调用，每次调用最多20张图片，请问这会产生什么问题吗？感谢您的回答

(MiniCPM-V) aipc@bogon swift % CUDA_VISIBLE_DEVICES=0 swift infer \

--model_type minicpm-v-v2_6-chat
--model_id_or_path OpenBMB/MiniCPM-V-2_6
error: unable to invoke subcommand: swift-infer (No such file or directory)您好，这个问题怎么解决？

swift3, please refer to here: https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal