NVlabs/VILA

About sharegpt_video. How do you make video file from jpeg images?

osttkm opened this issue · 2 comments

Hello, I hope this message finds you well. I have a question regarding the preparation of video data.
After installing and extracting the ShareGPTVideo (https://huggingface.co/datasets/ShareGPTVideo/train_video_and_instruction), I noticed that the train300k and train600k directories each contained multiple image files. Could you kindly advise on the FPS (frames per second) that should be used to convert these images back into video? I noticed that the ShareGPT GitHub (https://github.com/RifleZhang/LLaVA-Hound-DPO/tree/main) contains the following function, which seems to suggest using an FPS of 2. Can i use this fps for encode videos from images?

def decode2frame(video_path, frame_dir=None, verbose=False):
    if frame_dir is None:
        frame_dir, _ = os.path.splitext(video_path)
    os.makedirs(frame_dir, exist_ok=True)
    output_dir = f"{frame_dir}/c01_%04d.jpeg"
    cmd = 'ffmpeg -loglevel quiet -i {} -vf "scale=336:-1,fps=2" {}'.format(video_path, output_dir)
    if verbose:
        print(cmd)
    os.system(cmd)

Additionally, if you have any code or suggestions for converting the images back into video, I would be extremely grateful if you could share it with me.

Thank you very much for your assistance.

Hi, did you find a solution to your problem?

I've tyied to solve this problem, but it was difficult to extract and identify the video frame from activitynet.
https://github.com/RifleZhang/LLaVA-Hound-DPO/issues/11#issuecomment-2292532250