how to generate both action and final prompts?
Closed this issue · 2 comments
As detailed in our paper, please check Sec 3.3, "we train two distinct models with different sets of weights, one for generating action images and one for generating the final state images."
You can download the weights for both models using our download_weights script.
For each generated image, e.g. action, you need the corresponding prompt and real input image to generate the action image.
Hi @qqphung,
To generate any image, you need an input image and a text prompt -- the text prompt is the input of the method (not the output). You can use any prompt you can think of, but using prompts with objects and states used for training will work the best.
If you wish to get prompts automatically, we used BLIP2 image captioning model, as described in the text.
If you wish to generate actions, you can use the same model, but as described by @dimadamen and mentioned in the paper, we train a separate model for actions as it achieves better results. You can download the action weights here (the download script downloads only the state weights as the size of the model is quite substantial).