cubiq/ComfyUI_IPAdapter_plus

:face_with_spiral_eyes: Face Models Comparison and Suggestions

cubiq opened this issue · 187 comments

cubiq commented

⚠️ Preliminary Data ⚠️

Face Models Comparison

I started collecting data about all the face models available for IPAdapter. I'm generating thousands of images and comparing them with a face descriptor model. The result is subtracted to the original reference image. A value of 0 means 100% same person, 1.0 completely different.

BIAS! Important: please read!

The comparison is meant just as an overall help in choosing the right models. They are just numbers, they do not represent the actual image quality let alone the artistic value.

The face descriptor can be skewed by many factors and a face that is actually very good could get a low score for a number of reasons (head position, a weird shadow, ...). Don't take the following data as gospel, you still need to experiment.

Additionally the images are generated over a single pass of 30 steps. Better results could be probably achieved with a second pass and upscaling, but that would require a lot more time.

I think this data still has value to at least remove the worst offenders from your tests.

Round 1: skim the data

First step is to find the best performing checkpoints and IPAdapter face models (and face models combination). With that established we can move to the second phase which is running even more data concentrated on the best performers.

These are all the IPAdapter models that I've tested in random order, best performers are bold and will go to the next round.

  • PlusFace
  • FullFace
  • FaceID
  • FaceID + FullFace
  • FaceID + PlusFace
  • FaceID Plus
  • FaceID Plus + FaceID
  • FaceID Plus + PlusFace
  • FaceID Plus + FullFace
  • FaceID Plus v2 w=0.6
  • FaceID Plus v2 w=1
  • FaceID Plus v2 w=1.5
  • FaceID Plus v2 w=2
  • FaceID Plus v2 + PlusFace
  • FaceID Plus v2 + FullFace
  • FaceID Plus v2 + FaceID
  • FaceID Plus v2 + FaceIDPlus

These are the Checkpoints in random order, best performers are 🏆 bold.

  • 🏆 Deliberate_v3
  • Reliberate
  • absolutereality_v181
  • dreamshaper_8
  • icbinpICantBelieveIts_seco
  • 🏆 realisticVisionV51_v51VAE
  • realisticVisionV6_B1
  • juggernaut_reborn
  • epicrealism_naturalSin
  • edgeOfRealism_eorV20Fp16BakedVAE
  • 🏆 cyberrealistic_v41BackToBasics
  • 🏆 lifeLikeDiffusionV30

Dreamshaper will be excluded from photo-realistic models but I will run it again with other "illustration" style checkpoints.

The preliminary data is available in a google sheet: https://docs.google.com/spreadsheets/d/1NhOBZbSPmtBY9p52PRFsSYj76XDDc65QjcRIhb8vfIE/edit?usp=sharing

Round 2: Refining the data

In this phase I took the best performers from the previous round and ran more tests. Best results bold

  • FaceIDPlusv2 + PlusFace
  • FaceIDPlusv2 + FaceIDPlus
  • FaceIDPlusv2 + FullFace
  • FaceIDPlusv2 + FaceID
  • FaceIDPlusv2 2
  • FaceIDPlus + PlusFace
  • FaceIDPlus + FaceID
  • FaceID + FullFace

Basically more embeds, better results.

realisticVisionV51_v51VAE (NOT V6) Is overall the best performer but life like diffusion has often the single best result; meaning that the average is not as good as realistic vision, but sometimes you get that one result that is really good.

I tested both euclidean and 1-cosine and the result are surprisingly the same.

Since it seems that more embeddings give better results I'll also try to send multiple images of the same person to each model. I don't think it will help, but happy to be proven wrong.

The data for round 2 can be found here: https://docs.google.com/spreadsheets/d/1Mi2Pu9T3Hqz3Liq9Fdgs953fOD1f0mieBWUI6AN-kok/edit?usp=sharing

Preliminary SDXL

Combinations tested:

  • SDXL FaceID PlusFace
  • SDXL FaceIDPlusV2 PlusFace
  • 🏆 SDXL FaceIDPlusV2 FaceID

A the moment the best models seem to be:

  • 🏆 Juggernaut XL
  • 🏆 Realism Engine
  • base SDXL
  • ColossusProject
  • Realistic Stock Photo
  • Protovision XL
  • 🏆 TurboVision XL

Predictably V2+PlusFace again are the best performers. The best average is still .36.

Interestingly TurboVision XL performs very well.

Data: https://docs.google.com/spreadsheets/d/1hjiGB-QnKRYXTS6zTAuacRUfYUodUAdL6vZWTG4HZyc/edit?usp=sharing

Round 3: Testing multiple reference images

Processing...

Round 4: Higher resolution

Upscaling SD1.5 512×512 images is not advisable if you want to keep the likeliness as high as possible. Even using low denoise and high IPAdapter weight the base checkpoints are simply not good enough to keep the resemblance.

In my tests I lose about .5 likeliness after every upscale.

Fortunately you can still upscale SD1.5 models with SDXL FaceID + PlusFace (I used Juggernaut which is the best performer in the SDXL round). The results are very good. LifeLifeDiffusion and RealisticVision5 are still the best performers.

The average is still around 0.35 (which is lower than I'd like) but sometimes you get very good results (0.27), so it's worth running a few seeds and try with different reference images.

Result data here: https://docs.google.com/spreadsheets/d/1uVWJOcDxaEjRks-Lz0DE9A3DCCFX2qsvdpKi3bCSE2c/edit?usp=sharing

Methodology

I tried many libraries for feature extraction/face detection. In the aggregated results I find that the difference is relatively small, so at the moment I'm using Dlib and euclidean similarity. I'm trying to keep the generated images as close as possible in color/position/contrast to the original to have minimal skew.

I tried 1-consine and the results don't differ much from what is presented here so I take that the data is pretty strong. I will keep testing and update if there are any noticeable differences.

All primary embedding weights are set at .8, all secondary weights are set at .4.

which face descriptor you used?

cubiq commented

I tried a few... we could run an average maybe? dlib, MTCNN, RetinaFace are decent and pretty fast. Insighface seems to be biased since you trained with that.

the metric is 1-cos similarity”?
in fact, I used another insightface model (not the training used one) to evaluate

cubiq commented

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

I tried both euclidean and 1-cos. The numbers are of course different but the result is more or less the same.

This is euc vs 1-cos. The final result doesn't change much.
image

Do you get vastly different results?

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

I tried both euclidean and 1-cos. The numbers are of course different but the result is more or less the same.

This is euc vs 1-cos. The final result doesn't change much. image

Do you get vastly different results?

FaceNet?

cubiq commented

yes, facenet. Again, I've tried a few options but the result seems more or less the same. FaceID Plus v2 at weight=2 is always at the top.

Interestingly FaceIDPlus and a second pass with PlusFace or FullFace is also very effective. That makes me think that there are more combinations that we haven't explored.

You seem very interested, I'm glad about that. Please feel free to share your experience/ideas if you want.

yes, i am very interested, because a good metric is important to develop a good model.

you are right, you can also try FaceID + FaceID Plus

thresholds = {
"VGG-Face": {"cosine": 0.40, "euclidean": 0.60, "euclidean_l2": 0.86},
"Facenet": {"cosine": 0.40, "euclidean": 10, "euclidean_l2": 0.80},
"Facenet512": {"cosine": 0.30, "euclidean": 23.56, "euclidean_l2": 1.04},
"ArcFace": {"cosine": 0.68, "euclidean": 4.15, "euclidean_l2": 1.13},
"Dlib": {"cosine": 0.07, "euclidean": 0.6, "euclidean_l2": 0.4},
"SFace": {"cosine": 0.593, "euclidean": 10.734, "euclidean_l2": 1.055},
"OpenFace": {"cosine": 0.10, "euclidean": 0.55, "euclidean_l2": 0.55},
"DeepFace": {"cosine": 0.23, "euclidean": 64, "euclidean_l2": 0.64},
"DeepID": {"cosine": 0.015, "euclidean": 45, "euclidean_l2": 0.17},
}

cubiq commented

is that the minimum threshold? You set it very high. Almost only FaceID alone performs that low. At least in my testing

by the way, do you have some ideas or suggestions on improving the result, which maybe helpful to me.

is that the minimum threshold? You set it very high. Almost only FaceID alone performs that low. At least in my testing

yes, from deepface repo

in fact, I found face ID embedding is very powerful, i think I should find better training tricks l.

cubiq commented

I have tried FaceID Plus v2 + FaceID and it generally outperforms everything else.

Also tried FaceID Plus v2 at weight=2.5, some checkpoints react well to it but in general it's not a big difference.

I have tried FaceID Plus v2 + FaceID and it generally outperforms everything else.

Also tried FaceID Plus v2 at weight=2.5, some checkpoints react well to it but in general it's not a big difference.

what do you think of this https://twitter.com/multimodalart/status/1742575121057841468 (multi image)

SDXL FaceID preview
sdxl_faceid

in my benchmark,the cos similarity is a little better than sd 1.5 FaceID

cubiq commented

what do you think of this https://twitter.com/multimodalart/status/1742575121057841468 (multi image)

I've seen people send multiple images trying to increase the likeliness. I'm not convinced it actually works, there's a lot of bias in "face" recognition. I will run some tests, honestly I think it's laziness. I was able to reach 0.27 likeliness with a good combination of IPAdapter models at low resolution.

Combining 2 IPAdapter models I think it's more effective than sending multiple images to the same model. But I'll make some tests.

PS: looking forward to the SDXL model!

cubiq commented

@xiaohu2015 do you already have the code for SDXL? So I can update it and we are ready at launch 😄

@xiaohu2015 do you already have the code for SDXL? So I can update it and we are ready at launch 😄

it same as SD 1.5 FaceID: face embedding + LoRA

but I am not sure if SDXL version really better than the SD 1.5 version, because evaluation metrics are often unreliable

cubiq commented

okay I ran more tests, any combination of Plusv2 with any other model is definitely a winner.

These are all good:

  • FaceIDPlusv2 + PlusFace
  • FaceIDPlusv2 + FaceIDPlus
  • FaceIDPlusv2 + FullFace
  • FaceIDPlusv2 + FaceID

The only other NOT v2 combination that seems to be working well is FaceIDPlus+FaceID.

I'll update the first post when I have more data

PS: I got a 0.26 today at low resolution! Looking forward to do some high resolution test 😄

I will update SDXL model now, you can also test it

cubiq commented

great thanks!

I just updated the first post with new info. Data for round 2 is here: https://docs.google.com/spreadsheets/d/1Mi2Pu9T3Hqz3Liq9Fdgs953fOD1f0mieBWUI6AN-kok/edit?usp=sharing

I'll check SDXL later 😄 and run dedicated tests on it too.

cubiq commented

I just had a look at the key structure of the SDXL lora and it's a darn mess 😄 do you have a conversion mapping @xiaohu2015 ?

#145 (comment)

I think we can refer to this. You can find a normal sdxl lora weight and load it, print its keys, then you can get diff2ckpt for sdxl

In the future version, lora should be not needed

cubiq commented

the structure is pretty different and I couldn't find a relationship at first sight. But I'll check better later. I'm a bit busy this week, I might be able to work on it next Monday.

0.to_q_lora.down.weight
0.to_q_lora.up.weight
0.to_k_lora.down.weight
0.to_k_lora.up.weight
0.to_v_lora.down.weight
0.to_v_lora.up.weight
0.to_out_lora.down.weight
0.to_out_lora.up.weight
1.to_q_lora.down.weight
1.to_q_lora.up.weight
1.to_k_lora.down.weight
1.to_k_lora.up.weight
1.to_v_lora.down.weight
1.to_v_lora.up.weight
1.to_out_lora.down.weight
1.to_out_lora.up.weight
1.to_k_ip.weight
1.to_v_ip.weight
2.to_q_lora.down.weight
2.to_q_lora.up.weight
2.to_k_lora.down.weight
2.to_k_lora.up.weight
2.to_v_lora.down.weight
2.to_v_lora.up.weight
...
139.to_v_ip.weight

On SDXL

lora_unet_input_blocks_1_0_emb_layers_1.alpha
lora_unet_input_blocks_1_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_1_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_1_0_in_layers_2.alpha
lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_1_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_1_0_out_layers_3.alpha
lora_unet_input_blocks_1_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_1_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_2_0_emb_layers_1.alpha
lora_unet_input_blocks_2_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_2_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_2_0_in_layers_2.alpha
lora_unet_input_blocks_2_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_2_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_2_0_out_layers_3.alpha
lora_unet_input_blocks_2_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_2_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_3_0_op.alpha
lora_unet_input_blocks_3_0_op.lora_down.weight
lora_unet_input_blocks_3_0_op.lora_up.weight
lora_unet_input_blocks_4_0_emb_layers_1.alpha
lora_unet_input_blocks_4_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_4_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_4_0_in_layers_2.alpha
lora_unet_input_blocks_4_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_4_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_4_0_out_layers_3.alpha
...
lora_unet_output_blocks_8_0_skip_connection.lora_up.weight

So it looks a little more complicated than that 😄

@laksjdjf can you help

the structure is pretty different and I couldn't find a relationship at first sight. But I'll check better later. I'm a bit busy this week, I might be able to work on it next Monday.

0.to_q_lora.down.weight
0.to_q_lora.up.weight
0.to_k_lora.down.weight
0.to_k_lora.up.weight
0.to_v_lora.down.weight
0.to_v_lora.up.weight
0.to_out_lora.down.weight
0.to_out_lora.up.weight
1.to_q_lora.down.weight
1.to_q_lora.up.weight
1.to_k_lora.down.weight
1.to_k_lora.up.weight
1.to_v_lora.down.weight
1.to_v_lora.up.weight
1.to_out_lora.down.weight
1.to_out_lora.up.weight
1.to_k_ip.weight
1.to_v_ip.weight
2.to_q_lora.down.weight
2.to_q_lora.up.weight
2.to_k_lora.down.weight
2.to_k_lora.up.weight
2.to_v_lora.down.weight
2.to_v_lora.up.weight
...
139.to_v_ip.weight

On SDXL

lora_unet_input_blocks_1_0_emb_layers_1.alpha
lora_unet_input_blocks_1_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_1_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_1_0_in_layers_2.alpha
lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_1_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_1_0_out_layers_3.alpha
lora_unet_input_blocks_1_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_1_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_2_0_emb_layers_1.alpha
lora_unet_input_blocks_2_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_2_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_2_0_in_layers_2.alpha
lora_unet_input_blocks_2_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_2_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_2_0_out_layers_3.alpha
lora_unet_input_blocks_2_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_2_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_3_0_op.alpha
lora_unet_input_blocks_3_0_op.lora_down.weight
lora_unet_input_blocks_3_0_op.lora_up.weight
lora_unet_input_blocks_4_0_emb_layers_1.alpha
lora_unet_input_blocks_4_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_4_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_4_0_in_layers_2.alpha
lora_unet_input_blocks_4_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_4_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_4_0_out_layers_3.alpha
...
lora_unet_output_blocks_8_0_skip_connection.lora_up.weight

So it looks a little more complicated than that 😄

ok, I will also upload a lora weight next week

cubiq commented

It seems to be working pretty well together with plusface, but results are a bit random (either very good or very bad). I'll run some stats on that too.

ComfyUI_temp_lffkp_00011_

reference image:
theron

It is really great work!
I heard that a lot of people complain about similarity of double-chin face, big face, wearing glasses etc. Is there any test for these? Or some solution for these face shapes?

jepjoo commented

Can confirm that it works now. Thanks!

Can confirm that it works now. Thanks!

maybe give some cases? 😄

jepjoo commented

Input image:
sauli

Output, lora weight 1, FaceID weight 1:
ComfyUI_temp_pqcll_00016_

Output with lora disabled, FaceID weight 1 (just to demonstrate that LoRA works and has a big impact):
ComfyUI_temp_pqcll_00017_

In general, results do not seem to be at the level of SD1.5 FaceID plus having tested maybe 20 different input images now. This example output (the first one with lora enabled too) is better than average output.

Input image: sauli

Output, lora weight 1, FaceID weight 1: ComfyUI_temp_pqcll_00016_

Output with lora disabled, FaceID weight 1 (just to demonstrate that LoRA works and has a big impact): ComfyUI_temp_pqcll_00017_

In general, results do not seem to be at the level of SD1.5 FaceID plus having tested maybe 20 different input images now. This example output (the first one with lora enabled too) is better than average output.

you should compare with sd 1.5 faceid. In fact, the face consistency should be better than sd 1.5.

Editing because the SDXL Lora wasn't setup properly in my workflow:

Some Quick Examples of what I've been getting,
SDXL FaceID + SDXL Plus Face seems to work a little better than SD1.5 FaceID + SD1.5 Plus Face. (Both of these running with their respective LoRAs)

Input Image:
image

SD1.5 FaceID + SD1.5 Plus Face:
image

SDXL FaceID + SDXL Plus Face:
image

SDXL FaceID on it's own:
image

And then for reference an SD1.5 FaceID Plus V2:
image

Here's another couple with a different model

Input Image:
image

SDXL FaceID:
image

SDXL FaceID + SDXL Plus Face:
image

Here's another couple with a different model

Could you share your example workflow?
And: can you feed an already existing Image into the workflow as the Target?

My best results are with FaceID SDXL ( with lora ) and Plus Face.

test01j

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

Very simple workflow here - https://pastebin.com/9n66qNg9

You can use an existing image to do img2img like normal, or you could use inpainting and only inpaint the face, I don't have an inpainting workflow though.

My best results are with FaceID SDXL ( with lora ) and Plus Face.

test01j

Very nice!
What checkpoint/Loras do you use?
And what was your example prompt?
I still don‘t get those kind of convincing images with Juggernaut. They always look kind of „synthetic“. I have a post in the issues tab with my idea/Problem. Maybe you have a hint.

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

Very simple workflow here - https://pastebin.com/9n66qNg9

You can use an existing image to do img2img like normal, or you could use inpainting and only inpaint the face, I don't have an inpainting workflow though.

Thank you! Gonna try it tomorrow when I am back home. The urge to try and find out is very big…..

Very nice!
What checkpoint/Loras do you use?
And what was your example prompt?
I still don‘t get those kind of convincing images with Juggernaut. They always look kind of „synthetic“. I have a post in the issues tab with my idea/Problem. Maybe you have a hint.

Face ID images are all Juggernaut XL 7, no loras ( except for the Face ID lora ), like on the example workflow.
Juggernaut XL 8 does not work as well. Weights need to be a lot higher.

But Realism Engine SDXL v2 also worked well with Face ID.
Version 3 just came out, so I haven't tried yet.
https://civitai.com/models/152525/realism-engine-sdxl

The negative prompt is the one used on civitai by the Juggernaut XL creator.
https://civitai.com/images/2612019

The positive prompt is from a Midjourney 6 vs 5.2 comparison video ( at 3.35 )
https://www.youtube.com/watch?v=Zl_4V0ks7CE

I think Face ID makes the SDXL results closer to v6, than v5.2.
FaceID also improves skin tone and texture, and gives more complexity / realism to the facial features.

Without Face ID, these are best SDXL checkpoints for natural portraits:
https://civitai.com/models/189109/photopedia-xl
https://civitai.com/models/139565/realistic-stock-photo

As for Juggernaut XL, my favorite one is still version 5.
It works well with this lora.
https://civitai.com/models/170395/black-and-color

cubiq commented

I've run preliminary benchmarks on SDXL. I've updated the original post.

Best checkpoints: Juggernaut XL, Realism Engine.

SDXL FaceID is better than SD1.5 FaceID. The average is 0.37 vs 0.41 of SD1.5.

@cubiq can u pls describe the process of running this tests?

cubiq commented

what do you need to know, the generic process is explained in the "Methodology" paragraph above

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

cubiq commented

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

I will! looking forward!

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

I will! looking forward!

models: https://huggingface.co/h94/IP-Adapter-FaceID/resolve/main/ip-adapter-faceid-plusv2_sdxl.bin

it is same as faceid plus v2 sd 1.5 but for sdxl

cubiq commented

yes I can already tell that it's a lot better. Top FaceID, bottom FaceIDPlusV2 both with PlusFace added on top. I will launch some benchmarks later

v2-test

I'm having a problem with eyes when using FaceID - color is not transfered and from brown eyes I get blue/grayish iris. Also I haven't gotten any improvements when combining FaceID v2 SDXL with PlusFace on my personal examples - eyes become very distorted even with weights around 0.5. Can you share the best workflow used for testing (sd1.5 and sdxl faceidv2 + plusface)?

@dm33tri in fact, we use some heavy augmentation to avoid completely clone face. If you want to do that, why not use Inpainting?

cubiq commented

the generation needs to be augmented with prompting as well. That is something I don't do in this test because we are experimenting with just the face embeds.

This tests I'm doing are just to determine a baseline and possibly find the best base combination and checkpoints. They don't tell you the image quality, just how close the facial embeds are. Even so the data is extremely biased because we don't really have decent face detection models.

That being said, I think this still has value, as I said to get some base line info you can work on.

cubiq commented

okay quick report before posting the full data.

SDXL FaceID is already pretty good on it's own, the difference with SDXL FaceID Plus v2 is not staggering in terms on "face likeliness" but the visual quality of v2 is visibly better. The faces look more defined overall (with more details).

The best average I've got for SDXL is 0.36 with FaceIDPlusV2 + PlusFace.

Mixing FaceIDPlusV2 with FaceID is not a good idea and should avoided (best avg 0.39).

One last test I've done is to take a 512x512 SD1.5 image upscaled to 1024x1024 with SDXL.

Basically the models combination is: SD1.5 FaceID Plus v2 + Full Face upscaled with SDXL FaceID Plus v2 + Plus Face

I got no real likeliness improvement with an average again of 0.36.

My conclusion at this point is that we cannot expect an average result better that 0.35 relying exclusively on the models. Of course better results can be achieved with various other techniques (like compositing, inpainting, specific training, very accurate prompting...).

Best SDXL model is JuggernautXL.

Love the testing! Is the Lora for the tests set at 0.8 or 0.5?

cubiq commented

the lora is at 0.62

@dm33tri in fact, we use some heavy augmentation to avoid completely clone face. If you want to do that, why not use Inpainting?

What kind of augmentation? In my testing tinkering with the code I find that lowering the weights of the lower attention layers preserves likeness while allowing for more variety. I'm on the go so I forgot exactly what the parameters were but something like the weights from 1-16 found in extra_data here. The function gets called for each layer for each frame or something like that.

So I'd expose all the weights as number sliders in comfyui and lower the weights 1-6 close to zero and keep the remaining 6 to 15 at 1.

These are old screenshots using old models with the wrong implementation, but I have tested this method on the newer models as well and it works a lot better.

image

The prompt was something like "man laughing" with an input image of george w bush. The left image is all weights at 0, so it does nothing and just shows a man laughing. The middle image is something like what I mentioned above. The right image is all weights being at 1. Notice how the middle image more like "man laughing" with bush's likeness as opposed the left image which follows the input image more closely.

image

For me, SDXL FaceID Plus v2 is noticeably better in terms of face likeliness, at least with some seeds.
And independently form likeliness, the faces look better with any IPA, and even better with SDXL Face ID v2.

textv2c

cubiq commented

with SDXL v2 I got quite a few 0.28 during testing, meaning that playing with seed might help. But on average the pure embeds are not much closer. The image quality and details are higher but the mathematical difference is more or less the same

Another thing I've noticed is that there is a sweetspot of getting better likeness if the input face image is cropped and padded properly. There is a face crop node (not sure where it's from right now as I have many custom nodes) that crops the face and rotates it to be straight. If you then make sure there is 50 pixels (or something like that) around the face you'll get a better result than if the face fills exactly the whole image or the face is too small.

One last thing, (though this is probably known here already) is if you detect and crop out the face in the output image and do a second low denoise pass on the face, and then insert the output into the original image again, you will get a better result.

SDXL FaceID Plus v2 is better at capturing small details, inside the face.
But sometimes SDXL FaceID (v1) seems to be better at getting the whole face / head shape.
So when we've got a good seed, that's a better match for the whole face, results are much better on v2.

edit:

Here is an example with a seed that does not work as well for FaceID v2.
FaceID v2, also captured the internal face details better, but to my eye, the face / head shape seems a bit off.
The face / head in v2, seems to have a more rounded shape, while the reference has more of an oval shape.

textv2f

update:

Since my input image was not square, I was using a [prepare image por clip vision] node, but only for the Face Plus node.
Here it was used for both Adapters.
With the same seed, the results seem to be much better, for both v1 and v2.
The v2 head shape was improved, and is still more detailed than v1.
The face likeness is also closer between both models.
But I think v1 was able to capture different details of the original face.
So I would agree there is not a definitive winner.

textv2h

For me, SDXL FaceID Plus v2 is noticeably better in terms of face likeliness, at least with some seeds. And independently form likeliness, the faces look better with any IPA, and even better with SDXL Face ID v2.

textv2c

@JorgeR81 Could you perhaps share your workflow? I'd really appreciate it!

I tried to recreate it but I am having some trouble unfortunately :/

@JorgeR81 Could you perhaps share your workflow? I'd really appreciate it!
I tried to recreate it but I am having some trouble unfortunately :/

This is the default workflow, available here. 
#210 (comment)
Don't forget to set [faceid_v2] to [true], to use the v2 model.
I also added a [Prepare Image for Clip Vision] node, between the Image loader and the Plus Face adapter, since my input image is not square.

I'm using Juggernaut XL 7.
The positive prompt is probably not very good, since it was created to be confusing for the checkpoint, in order to test the limits of Midjourney and compare versions.
( see my previous posts, for more images and details ).
#195 (comment)
#195 (comment)

Results look good I think, also because the input image is good. 
It was created with CyberRealistic v3.3, with easynegative embedding. 
https://civitai.com/models/15003?modelVersionId=138176

See also the posts above.
Unless you have a "lucky" seed, results are not going to be much better than v1.

@JorgeR81 what're the weights you're using for the lora, face id, face id v2, and the plus face?

@JorgeR81 what're the weights you're using for the lora, face id, face id v2, and the plus face?

The weights for each node are under each image.
For the v2 image I used the same weights.

lora ( 0.6 )
face id ( 0.6 )
face id v2 ( 0.6 )
plus face ( 0.3 )

For Plus Face, I also set [start_at] to 0.4 to conserve the face position created by Face ID.

Some seeds, and checkpoints, may look better with different weights. These are just a starting point.
They are perhaps a little low, but I'm more interested in getting a good image, than to get perfect face likeness.

Here is an example, at full resolution, with Realism Engine 2, with FaceID v1.
In the first image, weights are the same, but in the bottom one, FaceID is at 0.8.
Here, a weight of 0.8 starts to affect image quality.
Notice how, in the bottom image, the face edge is too sharp in comparison with the face itself.

FaceID ( 0.6 ) + Plus Face

8

FaceID ( 0.8 ) + Plus Face -- resulting in worse image quality, in this case.

8b

@JorgeR81 I'm playing around with some of your values, what do you think of this result?

I'm seeing too many faces and I am going insane if the face looks similar as your input image, but to me it looks pretty alright:

image

Currently with only this added:

image

image

@JorgeR81 I'm playing around with some of your values, what do you think of this result?

They look really nice. The second one is closer I think.

The necklace also looks very good.
You were able to get the diamonds in there ! Did you change the prompt?
I also placed "RAW" at the start of the prompt. Not sure if that will make much of a difference.

I did not try the 1024 x 1024 resolution, but it seems to work well. There is more detail.
I used 1152 x 768, which is a little below the recommended size for SDXL.

For the second Adapter I am using [ ip-adapter-plus_sdxl_vit-h.safetensors ], instead of the SD 1.5 one. That could bring some improvements.

In some seeds, you may get better results if you use the [prepare image por clip vision] node for both adapters.
#195 (comment)

@JorgeR81 I'm playing around with some of your values, what do you think of this result?

They look really nice. The second one is closer I think.

The necklace also looks very good. You were able to get the diamonds in there ! Did you change the prompt? I also placed "RAW" at the start of the prompt. Not sure if that will make much of a difference.

I did not try the 1024 x 1024 resolution, but it seems to work well. There is more detail. I used 1152 x 768, which is a little below the recommended size for SDXL.

For the second Adapter I am using [ ip-adapter-plus_sdxl_vit-h.safetensors ], instead of the SD 1.5 one. That could bring some improvements.

In some seeds, you may get better results if you use the [prepare image por clip vision] node for both adapters. #195 (comment)

I actually didn't notice that I was using the SD 1.5 adapter, so I changed that to the ip-adapter-plus_sdxl_vit-h one. That also explains why I got the exact same image when I bypassed all the Plus-Face nodes. I just changed and got this image:

image

What do you think looks better?

Also for the prompt, I did not change anything, I copied the prompt from that Midjourney comparison video:

cinematic, photo, woman, cyberpunk, vermillion, anachronism, futuristic fragmentation, translucent, transcendence, transparent, layered composition, cyberpunk futurism, very light hair, braids, tattoos, implants, body covered with diamonds, shining jewellery, desaturated, muted light pink palette, ARRIFLEX 35 BL Camera, Canon K35 Prime Lenses, looking at camera

Edit: apparently all the results that I posted, are also without the LoRa. I just connected the LoRa properly and the images are not great :D

What do you think looks better?

The skin looks a bit more natural on the new one.
But some weights may be too high for this configuration, because the hair above the forehead is not as well defined.

I just connected the LoRa properly and the images are not great :D

Are you having worse results with the lora ?

@JorgeR81

Are you having worse results with the lora ?

Yeah for some reason the results are really bad with the LoRa. I’ll post some pictures soon.

I’m trying to tweak with the weights but whenever I think that the face looks somewhat alright; I try to upload a picture of myself or some other guy and then it’s a complete disaster.

I wish there were some magic values that work with everyone or something. Or atleast good consistent results for male and female..

It’s quite hard to get the perfect results unfortunately.

Yeah for some reason the results are really bad with the LoRa.

The first version of the lora was broken. But then it was fixed, and reuploaded.
If you see an error message in your cmd line, when you run the prompt, you should download it again.
#210 (comment)
#210 (comment)

It’s quite hard to get the perfect results unfortunately.

The future version of FaceID may not need a lora at all, so it will be easier to use.
#195 (comment)

And I think cubiq is already working on another method that seems very promising.
#224

@JorgeR81 That’s true. I’ll try to see if updating the LoRa fixes the issue.

Until then, I’ll try to play around with different weights that will result into perfect faces for male and female (both with the same values) so the flow becomes somewhat consistent. Since that’s what I need, ideally.

But the future is definitely looking promising 🚀

@cubiq We release ip-adapter-faceid-portrait at https://huggingface.co/h94/IP-Adapter-FaceID

cubiq commented

@cubiq We release ip-adapter-faceid-portrait at https://huggingface.co/h94/IP-Adapter-FaceID

you wanted to compete with PhotoMaker ? 😄

@cubiq We release ip-adapter-faceid-portrait at https://huggingface.co/h94/IP-Adapter-FaceID

you wanted to compete with PhotoMaker ? 😄

in fact, the model has been trained a while. It is limited to portrait generation and is sensitive to text prompt. But the advantage is high degree of freedom (supports text editing style)

maybe you can make a fair comparison with PhotoMaker and IPAdapter-FaceIDPlus?

cubiq commented

I will certainly do @xiaohu2015 thanks again for your great work

cubiq commented

has the structure changed @xiaohu2015 ?

        Missing key(s) in state_dict: "proj.weight", "proj.bias". 
        Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

I can see what is happening here

cubiq commented

okay now it just doesn't work with strict dict loading

has the structure changed @xiaohu2015 ?

        Missing key(s) in state_dict: "proj.weight", "proj.bias". 
        Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

I can see what is happening here

it is same with faceid model but no lora, do you use wrong pipeline?

cubiq commented

it doesn't work with strict state_dict loading, without strict loading the model is loaded but the results are all wrong, so I guess I need to check if anything changed in the code

Error(s) in loading state_dict for ImageProjModel:
Missing key(s) in state_dict: "proj.weight", "proj.bias".
Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

Getting the same error

Another question: I'm trying to get multiple reference images to work but not quite sure how to do that. Since this flow just loads one image each run from that specified folder..

it doesn't work with strict state_dict loading, without strict loading the model is loaded but the results are all wrong, so I guess I need to check if anything changed in the code

it use this project net: https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/ip_adapter_faceid.py#L64

Error(s) in loading state_dict for ImageProjModel:
Missing key(s) in state_dict: "proj.weight", "proj.bias".
Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

Getting the same error

Another question: I'm trying to get multiple reference images to work but not quite sure how to do that. Since this flow just loads one image each run from that specified folder..

use mean ID embedding or concat multi embedding

cubiq commented

mygod the new face-id portrait is amazing

ComfyUI_01338_

cubiq commented

portrait models are supported. resemblance is not great but they are super easy to style

portrait models are supported. resemblance is not great but they are super easy to style

using multiple face embeddings is helpful to enhance
id similarity

cubiq commented

yes, those images are created with 5 reference images each

now we need a comparison with photomaker 😄

yes, those images are created with 5 reference images each

How can I use multiple reference images with faceid? Do I apply it multiple times or is there another way?

Edit: I found the example

I tried the default workflow, for the portrait version.
As inputs, I used 5 images generated with the same prompt and different seeds, to see what it does with that.

wf

Image quality and face likeness is good for me, with 5 images.
The likeliness for the new image is not much worse than the likeliness between the 5 inputs, despite using a completely different prompt.

v3_5

Having a single image still gives decent likeness, but worse quality.

v3

Changing the prompt to "antique bronze statue" makes it cooler!

v3p_5

Set [start_at] = 0.60 and [weight] = 0.95, makes it look like an actual "bronze statue" ( let's say the sculptor encrusted glass eyes ).

p9560

This upcoming model claims to generate better facial likeness with a single-face image...
https://github.com/InstantID/InstantID

cubiq commented

I tested TurboVisionXL and it performs very well given the right configuration.

The new data: https://docs.google.com/spreadsheets/d/1hjiGB-QnKRYXTS6zTAuacRUfYUodUAdL6vZWTG4HZyc/edit?usp=sharing

@cubiq What are your expectations for the FaceID Plus v3 version?

cubiq commented

@cubiq What are your expectations for the FaceID Plus v3 version?

Personally I would like to get rid of insightface altogether and find something better.

That being said I would love to see a constant average under .30 (euclidean). I think I'm also hitting the limit of my methodology to check the embeds so I might need to try different models.

Also people really like a good result at first try, they don't want to add additional processes (like inpainting or a second pass to fix things or set an attention mask). So FaceID Portrait seems to be a good direction if you can keep the likeliness still high. Of course you need FaceID Portrait SDXL 😄

Hi, @cubiq. If you plan to test more turbo models, try also this one:

https://civitai.com/models/224983/bestmixsdxlphotocinematurbov1

It's based on TurboVisionXL, and it works well with DPM++ 2M SDE / DPM++ 2M.

Most of the other turbo models need DPM++ SDE, which is about twice as slow for me, for each step.

@cubiq What are your expectations for the FaceID Plus v3 version?

Personally I would like to get rid of insightface altogether and find something better.

That being said I would love to see a constant average under .30 (euclidean). I think I'm also hitting the limit of my methodology to check the embeds so I might need to try different models.

Also people really like a good result at first try, they don't want to add additional processes (like inpainting or a second pass to fix things or set an attention mask). So FaceID Portrait seems to be a good direction if you can keep the likeliness still high. Of course you need FaceID Portrait SDXL 😄

thank you. i also write some thing about face model.
tencent-ailab/IP-Adapter#266. i think current best way is full face + faceswap if you want to do clone. FaceID (or other methods like photomaker and instantID) just likeness but can't 100%

cubiq commented

thank you. i also write some thing about face model. tencent-ailab/IP-Adapter#266.

that's very interesting! Please keep documenting the models!

what should we expect for v3 ?

thank you. i also write some thing about face model. tencent-ailab/IP-Adapter#266.

that's very interesting! Please keep documenting the models!

what should we expect for v3 ?

Currently there is no v3 version.

FaceID-Portrait is a nice way to mix two faces together!

image

FaceID-Portrait is a nice way to mix two faces together!

It would be nice to have a version of the FaceID node that allows setting weight per image, like with the regular IPAdapter.

@JorgeR81 @cubiq for multiple images, it also can using a weight combine of id embeddings