Question about input data for training SkyGPT.

Question

Question about input data for training SkyGPT.

Opened this issue a year ago · 3 comments

Hi!
Thanks for the great work and thanks so much for releasing the implementation of your work and related algorithms. I am learning a lot from reading your paper. Also, your model and how you shaped the problem into a probabilistic model is quite interesting.

Therefore, I am currently trying to run the training of SkyGPT.

But I don't know that what are going to be the inputs for

Training the transformer ? (the data has to be specified by users).
Training the VQVAE ? I am guessing it is a hdf5 called GPT_full_2min.hdf5...

My understanding is that I will have to generate the samples for training SkyGPT

by running SkyGPT/script/reformat_input.py and then
Then, use the result from previous step for SkyGPT/script/sample_gen.py

Then, I will get a hdf5 file containing

'train_data': [B, H, W, 3] np.uint8,
'train_idx': [B], np.int64 (start indexes for each video)
'test_data': [B', H, W, 3] np.uint8,
'test_idx': [B'], np.int64

But here are the problems....

What is the input for SkyGPT/script/reformat_input.py ?
Is GPT_full_2min.hdf5 a resulting file from SkyGPT/script/sample_gen.py ?
What is the input for Training the transformer ?
And how are these data related to the files that you provided in the Google drive ?

Answer 1 · 2024-09-08T03:45:05.000Z

Hello, may I ask if the issue with the dataset you mentioned has been resolved?

Answer 2 · 2025-01-15T07:45:18.000Z

Hello, have you solved the input issue for the reformat function here? Is the file missing, or do we need additional code to generate it?

Answer 3 · 2025-02-20T06:35:48.000Z

您好，您解决了此处函数的输入问题reformat吗？文件是否丢失，或者我们需要额外的代码来生成它？

Hello, these files are from the video_prediction_dataset.hdf5 file.