Mplug_owl 2 support video training?

Question

Mplug_owl 2 support video training?

YuzhouPeng opened this issue 10 months ago · 3 comments

YuzhouPeng commented 10 months ago

Dose mplug_owl 2 support video training?

Answer 1 · 2023-11-15T02:49:26.000Z

You can decode video as multiple images for training.

Answer 2 · 2023-11-16T22:55:23.000Z

You can decode video as multiple images for training.

Hi,
I tried to use image frames from a video as sequence of images and tried inferencing on multiple images as below:

image_tensor = process_images([image1, image2], image_processor)
query = "Summarize the images"
with torch.inference_mode():
output_ids = model.generate(
input_ids,
images=image_tensor,
do_sample=True,
temperature=temperature,
max_new_tokens=max_new_tokens,
streamer=streamer,
use_cache=True,
stopping_criteria=[stopping_criteria])

outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
print(outputs)

Even though the first line in the above code gives me a tensor of shape [2, 3, 448, 448], the summary generated by the model solely focus on the content of the image1. Is it the right way to do it?

Answer 3 · 2023-11-17T02:02:36.000Z

You can decode video as multiple images for training.

Hi, I tried to use image frames from a video as sequence of images and tried inferencing on multiple images as below:

image_tensor = process_images([image1, image2], image_processor) query = "Summarize the images" with torch.inference_mode(): output_ids = model.generate( input_ids, images=image_tensor, do_sample=True, temperature=temperature, max_new_tokens=max_new_tokens, streamer=streamer, use_cache=True, stopping_criteria=[stopping_criteria])

outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip() print(outputs)

Even though the first line in the above code gives me a tensor of shape [2, 3, 448, 448], the summary generated by the model solely focus on the content of the image1. Is it the right way to do it?

I am also curious how to use image sequence to understand entire video, how to build context? @MAGAer13