OSU-NLP-Group/MagicBrush

Reproduce Evaluation Results (Table 2)

BennoKrojer opened this issue · 6 comments

Hi!

I am currently reproducing your evaluation and have a question about Table 2 "single turn" setting.
Does single turn mean you only evaluate the first turn or does it mean you evaluate all turns but with ground truth input?

Example:
[
{
"input": "242679-input.png",
"mask": "242679-mask1.png",
"output": "242679-output1.png",
"instruction": "Put a cat on the seat."
},
{
"input": "368667-input.png",
"mask": "368667-mask1.png",
"output": "368667-output1.png",
"instruction": "Have there be a stream running through the field"
},
{
"input": "368667-output1.png",
"mask": "368667-mask2.png",
"output": "368667-output2.png",
"instruction": "Add a giraffe in the field"
},

Would you ignore the last entry in your evaluation?

When I run the evaluation script, it says:
Final turn CLIP-I: 0.910077734751122
All turn CLIP-I: 0.9074493353409872

But it doesn't mention single turn.

Do you have the exact script that would lead to the single turn numbers of the MagicBrush model?

Thanks a lot,
Benno

Since I don't care about the "iter_" setting, I only generated the "inde_" examples and ran your script on these, expecting to somewhere see the numbers from the paper in the output metrics but they were off by a little.

Hi, thanks for your question.

Does single turn mean you only evaluate the first turn or does it mean you evaluate all turns but with ground truth input?

-> all turns bu with ground truh input.

Do you have the exact script that would lead to the single turn numbers of the MagicBrush model?

-> In eval_script, we have the single turn numbers.

But it doesn't mention single turn.
-> All turn CLIP-I actually means all the single turns with ground truth input. Sorry about the mismatch :)
Usually, you would expect the numbers of all turns (single turn) be higher than the final turn (multi turn).

Thank you! That should work

@drogozhang Could you share the generative images from other methods (open-edit, vegan-clip, and so on) in Table 2? We want to calculate the clip direction loss. Or would you mind adding clip direction loss to Table 2? Thx a lot.

The EmuEdit authors also use the MagicBrush Test Set in their Table 2. However, their score is very different from yours. For example, InstructPix2Pix has a Dino score of 0.767 on their paper, whereas its score is 0.6463 on your paper. Could you tell me why this happens? Thx a lot.

Hi, I didn't save these images, I think you can generate them with the provided checkpoint.

For EmuEdit, I don't know too much details but I guess they re-train the models with our data with better training hyper-parameters. OR they use SDXL-InstructPix2Pix.

get it. thx a lot.