:face_with_spiral_eyes: Face Models Comparison and Suggestions

Question

:face_with_spiral_eyes: Face Models Comparison and Suggestions

cubiq opened this issue a year ago · 187 comments

⚠️ Preliminary Data ⚠️

Face Models Comparison

I started collecting data about all the face models available for IPAdapter. I'm generating thousands of images and comparing them with a face descriptor model. The result is subtracted to the original reference image. A value of 0 means 100% same person, 1.0 completely different.

BIAS! Important: please read!

The comparison is meant just as an overall help in choosing the right models. They are just numbers, they do not represent the actual image quality let alone the artistic value.

The face descriptor can be skewed by many factors and a face that is actually very good could get a low score for a number of reasons (head position, a weird shadow, ...). Don't take the following data as gospel, you still need to experiment.

Additionally the images are generated over a single pass of 30 steps. Better results could be probably achieved with a second pass and upscaling, but that would require a lot more time.

I think this data still has value to at least remove the worst offenders from your tests.

Round 1: skim the data

First step is to find the best performing checkpoints and IPAdapter face models (and face models combination). With that established we can move to the second phase which is running even more data concentrated on the best performers.

These are all the IPAdapter models that I've tested in random order, best performers are bold and will go to the next round.

PlusFace
FullFace
FaceID
FaceID + FullFace
FaceID + PlusFace
FaceID Plus
FaceID Plus + FaceID
FaceID Plus + PlusFace
FaceID Plus + FullFace
FaceID Plus v2 w=0.6
FaceID Plus v2 w=1
FaceID Plus v2 w=1.5
FaceID Plus v2 w=2
FaceID Plus v2 + PlusFace
FaceID Plus v2 + FullFace
FaceID Plus v2 + FaceID
FaceID Plus v2 + FaceIDPlus

These are the Checkpoints in random order, best performers are 🏆 bold.

🏆 Deliberate_v3
Reliberate
absolutereality_v181
dreamshaper_8
icbinpICantBelieveIts_seco
🏆 realisticVisionV51_v51VAE
realisticVisionV6_B1
juggernaut_reborn
epicrealism_naturalSin
edgeOfRealism_eorV20Fp16BakedVAE
🏆 cyberrealistic_v41BackToBasics
🏆 lifeLikeDiffusionV30

Dreamshaper will be excluded from photo-realistic models but I will run it again with other "illustration" style checkpoints.

The preliminary data is available in a google sheet: https://docs.google.com/spreadsheets/d/1NhOBZbSPmtBY9p52PRFsSYj76XDDc65QjcRIhb8vfIE/edit?usp=sharing

Round 2: Refining the data

In this phase I took the best performers from the previous round and ran more tests. Best results bold

FaceIDPlusv2 + PlusFace
FaceIDPlusv2 + FaceIDPlus
FaceIDPlusv2 + FullFace
FaceIDPlusv2 + FaceID
FaceIDPlusv2 2
FaceIDPlus + PlusFace
FaceIDPlus + FaceID
FaceID + FullFace

Basically more embeds, better results.

realisticVisionV51_v51VAE (NOT V6) Is overall the best performer but life like diffusion has often the single best result; meaning that the average is not as good as realistic vision, but sometimes you get that one result that is really good.

I tested both euclidean and 1-cosine and the result are surprisingly the same.

Since it seems that more embeddings give better results I'll also try to send multiple images of the same person to each model. I don't think it will help, but happy to be proven wrong.

The data for round 2 can be found here: https://docs.google.com/spreadsheets/d/1Mi2Pu9T3Hqz3Liq9Fdgs953fOD1f0mieBWUI6AN-kok/edit?usp=sharing

Preliminary SDXL

Combinations tested:

SDXL FaceID PlusFace
SDXL FaceIDPlusV2 PlusFace
🏆 SDXL FaceIDPlusV2 FaceID

A the moment the best models seem to be:

🏆 Juggernaut XL
🏆 Realism Engine
base SDXL
ColossusProject
Realistic Stock Photo
Protovision XL
🏆 TurboVision XL

Predictably V2+PlusFace again are the best performers. The best average is still .36.

Interestingly TurboVision XL performs very well.

Data: https://docs.google.com/spreadsheets/d/1hjiGB-QnKRYXTS6zTAuacRUfYUodUAdL6vZWTG4HZyc/edit?usp=sharing

Round 3: Testing multiple reference images

Processing...

Round 4: Higher resolution

Upscaling SD1.5 512×512 images is not advisable if you want to keep the likeliness as high as possible. Even using low denoise and high IPAdapter weight the base checkpoints are simply not good enough to keep the resemblance.

In my tests I lose about .5 likeliness after every upscale.

Fortunately you can still upscale SD1.5 models with SDXL FaceID + PlusFace (I used Juggernaut which is the best performer in the SDXL round). The results are very good. LifeLifeDiffusion and RealisticVision5 are still the best performers.

The average is still around 0.35 (which is lower than I'd like) but sometimes you get very good results (0.27), so it's worth running a few seeds and try with different reference images.

Result data here: https://docs.google.com/spreadsheets/d/1uVWJOcDxaEjRks-Lz0DE9A3DCCFX2qsvdpKi3bCSE2c/edit?usp=sharing

Methodology

I tried many libraries for feature extraction/face detection. In the aggregated results I find that the difference is relatively small, so at the moment I'm using Dlib and euclidean similarity. I'm trying to keep the generated images as close as possible in color/position/contrast to the original to have minimal skew.

I tried 1-consine and the results don't differ much from what is presented here so I take that the data is pretty strong. I will keep testing and update if there are any noticeable differences.

All primary embedding weights are set at .8, all secondary weights are set at .4.

Answer 1 · 2024-01-03T13:14:31.000Z

which face descriptor you used?

Answer 2 · 2024-01-03T13:28:37.000Z

I tried a few... we could run an average maybe? dlib, MTCNN, RetinaFace are decent and pretty fast. Insighface seems to be biased since you trained with that.

Answer 3 · 2024-01-03T13:57:46.000Z

the metric is 1-cos similarity”?
in fact, I used another insightface model (not the training used one) to evaluate

Answer 4 · 2024-01-03T14:18:46.000Z

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

I tried both euclidean and 1-cos. The numbers are of course different but the result is more or less the same.

This is euc vs 1-cos. The final result doesn't change much.

Do you get vastly different results?

Answer 5 · 2024-01-03T14:32:59.000Z

the metric is 1-cos similarity”? in fact, I used another insightface model (not the training used one) to evaluate

I tried both euclidean and 1-cos. The numbers are of course different but the result is more or less the same.

This is euc vs 1-cos. The final result doesn't change much.

Do you get vastly different results?

FaceNet?

Answer 6 · 2024-01-03T14:41:17.000Z

yes, facenet. Again, I've tried a few options but the result seems more or less the same. FaceID Plus v2 at weight=2 is always at the top.

Interestingly FaceIDPlus and a second pass with PlusFace or FullFace is also very effective. That makes me think that there are more combinations that we haven't explored.

You seem very interested, I'm glad about that. Please feel free to share your experience/ideas if you want.

Answer 7 · 2024-01-03T14:47:46.000Z

yes, i am very interested, because a good metric is important to develop a good model.

you are right, you can also try FaceID + FaceID Plus

thresholds = {
"VGG-Face": {"cosine": 0.40, "euclidean": 0.60, "euclidean_l2": 0.86},
"Facenet": {"cosine": 0.40, "euclidean": 10, "euclidean_l2": 0.80},
"Facenet512": {"cosine": 0.30, "euclidean": 23.56, "euclidean_l2": 1.04},
"ArcFace": {"cosine": 0.68, "euclidean": 4.15, "euclidean_l2": 1.13},
"Dlib": {"cosine": 0.07, "euclidean": 0.6, "euclidean_l2": 0.4},
"SFace": {"cosine": 0.593, "euclidean": 10.734, "euclidean_l2": 1.055},
"OpenFace": {"cosine": 0.10, "euclidean": 0.55, "euclidean_l2": 0.55},
"DeepFace": {"cosine": 0.23, "euclidean": 64, "euclidean_l2": 0.64},
"DeepID": {"cosine": 0.015, "euclidean": 45, "euclidean_l2": 0.17},
}

Answer 8 · 2024-01-03T14:54:10.000Z

is that the minimum threshold? You set it very high. Almost only FaceID alone performs that low. At least in my testing

Answer 9 · 2024-01-03T14:54:48.000Z

by the way, do you have some ideas or suggestions on improving the result, which maybe helpful to me.

Answer 10 · 2024-01-03T14:55:31.000Z

is that the minimum threshold? You set it very high. Almost only FaceID alone performs that low. At least in my testing

yes, from deepface repo

in fact, I found face ID embedding is very powerful, i think I should find better training tricks l.

Answer 11 · 2024-01-04T07:54:17.000Z

I have tried FaceID Plus v2 + FaceID and it generally outperforms everything else.

Also tried FaceID Plus v2 at weight=2.5, some checkpoints react well to it but in general it's not a big difference.

Answer 12 · 2024-01-04T07:59:33.000Z

I have tried FaceID Plus v2 + FaceID and it generally outperforms everything else.

Also tried FaceID Plus v2 at weight=2.5, some checkpoints react well to it but in general it's not a big difference.

what do you think of this https://twitter.com/multimodalart/status/1742575121057841468 (multi image)

Answer 13 · 2024-01-04T08:04:32.000Z

SDXL FaceID preview

in my benchmark，the cos similarity is a little better than sd 1.5 FaceID

Answer 14 · 2024-01-04T08:12:14.000Z

what do you think of this https://twitter.com/multimodalart/status/1742575121057841468 (multi image)

I've seen people send multiple images trying to increase the likeliness. I'm not convinced it actually works, there's a lot of bias in "face" recognition. I will run some tests, honestly I think it's laziness. I was able to reach 0.27 likeliness with a good combination of IPAdapter models at low resolution.

Combining 2 IPAdapter models I think it's more effective than sending multiple images to the same model. But I'll make some tests.

PS: looking forward to the SDXL model!

Answer 15 · 2024-01-04T08:17:24.000Z

@xiaohu2015 do you already have the code for SDXL? So I can update it and we are ready at launch 😄

Answer 16 · 2024-01-04T08:28:20.000Z

@xiaohu2015 do you already have the code for SDXL? So I can update it and we are ready at launch 😄

it same as SD 1.5 FaceID: face embedding + LoRA

but I am not sure if SDXL version really better than the SD 1.5 version, because evaluation metrics are often unreliable

Answer 17 · 2024-01-04T11:01:19.000Z

okay I ran more tests, any combination of Plusv2 with any other model is definitely a winner.

These are all good:

FaceIDPlusv2 + PlusFace
FaceIDPlusv2 + FaceIDPlus
FaceIDPlusv2 + FullFace
FaceIDPlusv2 + FaceID

The only other NOT v2 combination that seems to be working well is FaceIDPlus+FaceID.

I'll update the first post when I have more data

PS: I got a 0.26 today at low resolution! Looking forward to do some high resolution test 😄

Answer 18 · 2024-01-04T11:38:15.000Z

I will update SDXL model now, you can also test it

Answer 19 · 2024-01-04T11:51:22.000Z

@cubiq update at https://huggingface.co/h94/IP-Adapter-FaceID#ip-adapter-faceid-sdxl

but you should convert the lora part

Answer 20 · 2024-01-04T11:53:58.000Z

great thanks!

I just updated the first post with new info. Data for round 2 is here: https://docs.google.com/spreadsheets/d/1Mi2Pu9T3Hqz3Liq9Fdgs953fOD1f0mieBWUI6AN-kok/edit?usp=sharing

I'll check SDXL later 😄 and run dedicated tests on it too.

Answer 21 · 2024-01-05T08:10:40.000Z

I just had a look at the key structure of the SDXL lora and it's a darn mess 😄 do you have a conversion mapping @xiaohu2015 ?

Answer 22 · 2024-01-05T08:19:02.000Z

#145 (comment)

I think we can refer to this. You can find a normal sdxl lora weight and load it, print its keys, then you can get diff2ckpt for sdxl

In the future version, lora should be not needed

Answer 23 · 2024-01-05T08:23:51.000Z

the structure is pretty different and I couldn't find a relationship at first sight. But I'll check better later. I'm a bit busy this week, I might be able to work on it next Monday.

0.to_q_lora.down.weight
0.to_q_lora.up.weight
0.to_k_lora.down.weight
0.to_k_lora.up.weight
0.to_v_lora.down.weight
0.to_v_lora.up.weight
0.to_out_lora.down.weight
0.to_out_lora.up.weight
1.to_q_lora.down.weight
1.to_q_lora.up.weight
1.to_k_lora.down.weight
1.to_k_lora.up.weight
1.to_v_lora.down.weight
1.to_v_lora.up.weight
1.to_out_lora.down.weight
1.to_out_lora.up.weight
1.to_k_ip.weight
1.to_v_ip.weight
2.to_q_lora.down.weight
2.to_q_lora.up.weight
2.to_k_lora.down.weight
2.to_k_lora.up.weight
2.to_v_lora.down.weight
2.to_v_lora.up.weight
...
139.to_v_ip.weight

On SDXL

lora_unet_input_blocks_1_0_emb_layers_1.alpha
lora_unet_input_blocks_1_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_1_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_1_0_in_layers_2.alpha
lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_1_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_1_0_out_layers_3.alpha
lora_unet_input_blocks_1_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_1_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_2_0_emb_layers_1.alpha
lora_unet_input_blocks_2_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_2_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_2_0_in_layers_2.alpha
lora_unet_input_blocks_2_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_2_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_2_0_out_layers_3.alpha
lora_unet_input_blocks_2_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_2_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_3_0_op.alpha
lora_unet_input_blocks_3_0_op.lora_down.weight
lora_unet_input_blocks_3_0_op.lora_up.weight
lora_unet_input_blocks_4_0_emb_layers_1.alpha
lora_unet_input_blocks_4_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_4_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_4_0_in_layers_2.alpha
lora_unet_input_blocks_4_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_4_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_4_0_out_layers_3.alpha
...
lora_unet_output_blocks_8_0_skip_connection.lora_up.weight

So it looks a little more complicated than that 😄

Answer 24 · 2024-01-05T08:26:26.000Z

@laksjdjf can you help

Answer 25 · 2024-01-05T08:43:16.000Z

the structure is pretty different and I couldn't find a relationship at first sight. But I'll check better later. I'm a bit busy this week, I might be able to work on it next Monday.

0.to_q_lora.down.weight
0.to_q_lora.up.weight
0.to_k_lora.down.weight
0.to_k_lora.up.weight
0.to_v_lora.down.weight
0.to_v_lora.up.weight
0.to_out_lora.down.weight
0.to_out_lora.up.weight
1.to_q_lora.down.weight
1.to_q_lora.up.weight
1.to_k_lora.down.weight
1.to_k_lora.up.weight
1.to_v_lora.down.weight
1.to_v_lora.up.weight
1.to_out_lora.down.weight
1.to_out_lora.up.weight
1.to_k_ip.weight
1.to_v_ip.weight
2.to_q_lora.down.weight
2.to_q_lora.up.weight
2.to_k_lora.down.weight
2.to_k_lora.up.weight
2.to_v_lora.down.weight
2.to_v_lora.up.weight
...
139.to_v_ip.weight

On SDXL

lora_unet_input_blocks_1_0_emb_layers_1.alpha
lora_unet_input_blocks_1_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_1_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_1_0_in_layers_2.alpha
lora_unet_input_blocks_1_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_1_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_1_0_out_layers_3.alpha
lora_unet_input_blocks_1_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_1_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_2_0_emb_layers_1.alpha
lora_unet_input_blocks_2_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_2_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_2_0_in_layers_2.alpha
lora_unet_input_blocks_2_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_2_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_2_0_out_layers_3.alpha
lora_unet_input_blocks_2_0_out_layers_3.lora_down.weight
lora_unet_input_blocks_2_0_out_layers_3.lora_up.weight
lora_unet_input_blocks_3_0_op.alpha
lora_unet_input_blocks_3_0_op.lora_down.weight
lora_unet_input_blocks_3_0_op.lora_up.weight
lora_unet_input_blocks_4_0_emb_layers_1.alpha
lora_unet_input_blocks_4_0_emb_layers_1.lora_down.weight
lora_unet_input_blocks_4_0_emb_layers_1.lora_up.weight
lora_unet_input_blocks_4_0_in_layers_2.alpha
lora_unet_input_blocks_4_0_in_layers_2.lora_down.weight
lora_unet_input_blocks_4_0_in_layers_2.lora_up.weight
lora_unet_input_blocks_4_0_out_layers_3.alpha
...
lora_unet_output_blocks_8_0_skip_connection.lora_up.weight

So it looks a little more complicated than that 😄

ok, I will also upload a lora weight next week

Answer 26 · 2024-01-05T12:15:54.000Z

@cubiq https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors I use https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py to convert, it should work

Answer 27 · 2024-01-05T16:27:20.000Z

It seems to be working pretty well together with plusface, but results are a bit random (either very good or very bad). I'll run some stats on that too.

reference image:

Answer 28 · 2024-01-06T10:13:18.000Z

It is really great work!
I heard that a lot of people complain about similarity of double-chin face, big face, wearing glasses etc. Is there any test for these? Or some solution for these face shapes?

Answer 29 · 2024-01-09T06:12:57.000Z

@cubiq https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors I use https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py to convert, it should work

unfortunately that is not correct @xiaohu2015 . I'll see if I can fix it in the coming days

Answer 30 · 2024-01-09T07:05:25.000Z

@cubiq https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors I use https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_sdxl_lora_to_webui.py to convert, it should work

unfortunately that is not correct @xiaohu2015 . I'll see if I can fix it in the coming days

OK

Answer 31 · 2024-01-09T11:12:11.000Z

the lora weight file has been updated: https://huggingface.co/h94/IP-Adapter-FaceID/blob/main/ip-adapter-faceid_sdxl_lora.safetensors. It should work.

Answer 32 · 2024-01-09T11:42:51.000Z

Can confirm that it works now. Thanks!

Answer 33 · 2024-01-09T11:53:51.000Z

Can confirm that it works now. Thanks!

maybe give some cases? 😄

Answer 34 · 2024-01-09T12:38:19.000Z

Input image:

Output, lora weight 1, FaceID weight 1:

Output with lora disabled, FaceID weight 1 (just to demonstrate that LoRA works and has a big impact):

In general, results do not seem to be at the level of SD1.5 FaceID plus having tested maybe 20 different input images now. This example output (the first one with lora enabled too) is better than average output.

Answer 35 · 2024-01-09T13:00:26.000Z

Input image:

Output, lora weight 1, FaceID weight 1:

Output with lora disabled, FaceID weight 1 (just to demonstrate that LoRA works and has a big impact):

In general, results do not seem to be at the level of SD1.5 FaceID plus having tested maybe 20 different input images now. This example output (the first one with lora enabled too) is better than average output.

you should compare with sd 1.5 faceid. In fact, the face consistency should be better than sd 1.5.

Answer 36 · 2024-01-09T15:06:59.000Z

Editing because the SDXL Lora wasn't setup properly in my workflow:

Some Quick Examples of what I've been getting,
SDXL FaceID + SDXL Plus Face seems to work a little better than SD1.5 FaceID + SD1.5 Plus Face. (Both of these running with their respective LoRAs)

Input Image:

SD1.5 FaceID + SD1.5 Plus Face:

SDXL FaceID + SDXL Plus Face:

SDXL FaceID on it's own:

And then for reference an SD1.5 FaceID Plus V2:

Answer 37 · 2024-01-09T15:18:51.000Z

Here's another couple with a different model

Input Image:

SDXL FaceID:

SDXL FaceID + SDXL Plus Face:

Answer 38 · 2024-01-09T16:46:49.000Z

Here's another couple with a different model

Could you share your example workflow?
And: can you feed an already existing Image into the workflow as the Target?

Answer 39 · 2024-01-10T00:40:09.000Z

My best results are with FaceID SDXL ( with lora ) and Plus Face.

Answer 40 · 2024-01-10T02:23:13.000Z

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

Very simple workflow here - https://pastebin.com/9n66qNg9

You can use an existing image to do img2img like normal, or you could use inpainting and only inpaint the face, I don't have an inpainting workflow though.

Answer 41 · 2024-01-10T12:50:43.000Z

My best results are with FaceID SDXL ( with lora ) and Plus Face.

Very nice!
What checkpoint/Loras do you use?
And what was your example prompt?
I still don‘t get those kind of convincing images with Juggernaut. They always look kind of „synthetic“. I have a post in the issues tab with my idea/Problem. Maybe you have a hint.

Answer 42 · 2024-01-10T12:51:13.000Z

Here's another couple with a different model

Could you share your example workflow? And: can you feed an already existing Image into the workflow as the Target?

Very simple workflow here - https://pastebin.com/9n66qNg9

You can use an existing image to do img2img like normal, or you could use inpainting and only inpaint the face, I don't have an inpainting workflow though.

Thank you! Gonna try it tomorrow when I am back home. The urge to try and find out is very big…..

Answer 43 · 2024-01-10T16:51:01.000Z

Very nice!
What checkpoint/Loras do you use?
And what was your example prompt?
I still don‘t get those kind of convincing images with Juggernaut. They always look kind of „synthetic“. I have a post in the issues tab with my idea/Problem. Maybe you have a hint.

Face ID images are all Juggernaut XL 7, no loras ( except for the Face ID lora ), like on the example workflow.
Juggernaut XL 8 does not work as well. Weights need to be a lot higher.

But Realism Engine SDXL v2 also worked well with Face ID.
Version 3 just came out, so I haven't tried yet.
https://civitai.com/models/152525/realism-engine-sdxl

The negative prompt is the one used on civitai by the Juggernaut XL creator.
https://civitai.com/images/2612019

The positive prompt is from a Midjourney 6 vs 5.2 comparison video ( at 3.35 )
https://www.youtube.com/watch?v=Zl_4V0ks7CE

I think Face ID makes the SDXL results closer to v6, than v5.2.
FaceID also improves skin tone and texture, and gives more complexity / realism to the facial features.

Without Face ID, these are best SDXL checkpoints for natural portraits:
https://civitai.com/models/189109/photopedia-xl
https://civitai.com/models/139565/realistic-stock-photo

As for Juggernaut XL, my favorite one is still version 5.
It works well with this lora.
https://civitai.com/models/170395/black-and-color

Answer 44 · 2024-01-14T11:00:25.000Z

I've run preliminary benchmarks on SDXL. I've updated the original post.

Best checkpoints: Juggernaut XL, Realism Engine.

SDXL FaceID is better than SD1.5 FaceID. The average is 0.37 vs 0.41 of SD1.5.

Answer 45 · 2024-01-16T07:47:15.000Z

@cubiq can u pls describe the process of running this tests?

Answer 46 · 2024-01-16T07:48:41.000Z

what do you need to know, the generic process is explained in the "Methodology" paragraph above

Answer 47 · 2024-01-17T09:32:40.000Z

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

Answer 48 · 2024-01-17T09:34:03.000Z

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

I will! looking forward!

Answer 49 · 2024-01-17T10:49:08.000Z

@cubiq I will release the first version of sdxl plus v2, maybe you can do some comparison

I will! looking forward!

models: https://huggingface.co/h94/IP-Adapter-FaceID/resolve/main/ip-adapter-faceid-plusv2_sdxl.bin

it is same as faceid plus v2 sd 1.5 but for sdxl

Answer 50 · 2024-01-17T10:55:40.000Z

yes I can already tell that it's a lot better. Top FaceID, bottom FaceIDPlusV2 both with PlusFace added on top. I will launch some benchmarks later

Answer 51 · 2024-01-17T12:58:49.000Z

I'm having a problem with eyes when using FaceID - color is not transfered and from brown eyes I get blue/grayish iris. Also I haven't gotten any improvements when combining FaceID v2 SDXL with PlusFace on my personal examples - eyes become very distorted even with weights around 0.5. Can you share the best workflow used for testing (sd1.5 and sdxl faceidv2 + plusface)?

Answer 52 · 2024-01-17T13:14:24.000Z

@dm33tri in fact, we use some heavy augmentation to avoid completely clone face. If you want to do that, why not use Inpainting?

Answer 53 · 2024-01-17T13:19:29.000Z

the generation needs to be augmented with prompting as well. That is something I don't do in this test because we are experimenting with just the face embeds.

This tests I'm doing are just to determine a baseline and possibly find the best base combination and checkpoints. They don't tell you the image quality, just how close the facial embeds are. Even so the data is extremely biased because we don't really have decent face detection models.

That being said, I think this still has value, as I said to get some base line info you can work on.

Answer 54 · 2024-01-17T14:12:24.000Z

okay quick report before posting the full data.

SDXL FaceID is already pretty good on it's own, the difference with SDXL FaceID Plus v2 is not staggering in terms on "face likeliness" but the visual quality of v2 is visibly better. The faces look more defined overall (with more details).

The best average I've got for SDXL is 0.36 with FaceIDPlusV2 + PlusFace.

Mixing FaceIDPlusV2 with FaceID is not a good idea and should avoided (best avg 0.39).

One last test I've done is to take a 512x512 SD1.5 image upscaled to 1024x1024 with SDXL.

Basically the models combination is: SD1.5 FaceID Plus v2 + Full Face upscaled with SDXL FaceID Plus v2 + Plus Face

I got no real likeliness improvement with an average again of 0.36.

My conclusion at this point is that we cannot expect an average result better that 0.35 relying exclusively on the models. Of course better results can be achieved with various other techniques (like compositing, inpainting, specific training, very accurate prompting...).

Best SDXL model is JuggernautXL.

Answer 55 · 2024-01-17T14:56:41.000Z

Love the testing! Is the Lora for the tests set at 0.8 or 0.5?

Answer 56 · 2024-01-17T14:57:49.000Z

the lora is at 0.62

Answer 57 · 2024-01-17T15:30:27.000Z

@dm33tri in fact, we use some heavy augmentation to avoid completely clone face. If you want to do that, why not use Inpainting?

What kind of augmentation? In my testing tinkering with the code I find that lowering the weights of the lower attention layers preserves likeness while allowing for more variety. I'm on the go so I forgot exactly what the parameters were but something like the weights from 1-16 found in extra_data here. The function gets called for each layer for each frame or something like that.

So I'd expose all the weights as number sliders in comfyui and lower the weights 1-6 close to zero and keep the remaining 6 to 15 at 1.

These are old screenshots using old models with the wrong implementation, but I have tested this method on the newer models as well and it works a lot better.

The prompt was something like "man laughing" with an input image of george w bush. The left image is all weights at 0, so it does nothing and just shows a man laughing. The middle image is something like what I mentioned above. The right image is all weights being at 1. Notice how the middle image more like "man laughing" with bush's likeness as opposed the left image which follows the input image more closely.

Answer 58 · 2024-01-17T15:36:02.000Z

For me, SDXL FaceID Plus v2 is noticeably better in terms of face likeliness, at least with some seeds.
And independently form likeliness, the faces look better with any IPA, and even better with SDXL Face ID v2.

Answer 59 · 2024-01-17T15:39:00.000Z

with SDXL v2 I got quite a few 0.28 during testing, meaning that playing with seed might help. But on average the pure embeds are not much closer. The image quality and details are higher but the mathematical difference is more or less the same

Answer 60 · 2024-01-17T15:45:27.000Z

Another thing I've noticed is that there is a sweetspot of getting better likeness if the input face image is cropped and padded properly. There is a face crop node (not sure where it's from right now as I have many custom nodes) that crops the face and rotates it to be straight. If you then make sure there is 50 pixels (or something like that) around the face you'll get a better result than if the face fills exactly the whole image or the face is too small.

One last thing, (though this is probably known here already) is if you detect and crop out the face in the output image and do a second low denoise pass on the face, and then insert the output into the original image again, you will get a better result.

Answer 61 · 2024-01-17T18:57:15.000Z

SDXL FaceID Plus v2 is better at capturing small details, inside the face.
But sometimes SDXL FaceID (v1) seems to be better at getting the whole face / head shape.
So when we've got a good seed, that's a better match for the whole face, results are much better on v2.

edit:

Here is an example with a seed that does not work as well for FaceID v2.
FaceID v2, also captured the internal face details better, but to my eye, the face / head shape seems a bit off.
The face / head in v2, seems to have a more rounded shape, while the reference has more of an oval shape.

update:

Since my input image was not square, I was using a [prepare image por clip vision] node, but only for the Face Plus node.
Here it was used for both Adapters.
With the same seed, the results seem to be much better, for both v1 and v2.
The v2 head shape was improved, and is still more detailed than v1.
The face likeness is also closer between both models.
But I think v1 was able to capture different details of the original face.
So I would agree there is not a definitive winner.

Answer 62 · 2024-01-17T19:42:55.000Z

For me, SDXL FaceID Plus v2 is noticeably better in terms of face likeliness, at least with some seeds. And independently form likeliness, the faces look better with any IPA, and even better with SDXL Face ID v2.

@JorgeR81 Could you perhaps share your workflow? I'd really appreciate it!

I tried to recreate it but I am having some trouble unfortunately :/

Answer 63 · 2024-01-17T21:20:14.000Z

@JorgeR81 Could you perhaps share your workflow? I'd really appreciate it!
I tried to recreate it but I am having some trouble unfortunately :/

This is the default workflow, available here.
#210 (comment)
Don't forget to set [faceid_v2] to [true], to use the v2 model.
I also added a [Prepare Image for Clip Vision] node, between the Image loader and the Plus Face adapter, since my input image is not square.

I'm using Juggernaut XL 7.
The positive prompt is probably not very good, since it was created to be confusing for the checkpoint, in order to test the limits of Midjourney and compare versions.
( see my previous posts, for more images and details ).
#195 (comment)
#195 (comment)

Results look good I think, also because the input image is good.
It was created with CyberRealistic v3.3, with easynegative embedding.
https://civitai.com/models/15003?modelVersionId=138176

See also the posts above.
Unless you have a "lucky" seed, results are not going to be much better than v1.

Answer 64 · 2024-01-18T06:09:28.000Z

@JorgeR81 what're the weights you're using for the lora, face id, face id v2, and the plus face?

Answer 65 · 2024-01-18T10:44:27.000Z

@JorgeR81 what're the weights you're using for the lora, face id, face id v2, and the plus face?

The weights for each node are under each image.
For the v2 image I used the same weights.

lora ( 0.6 )
face id ( 0.6 )
face id v2 ( 0.6 )
plus face ( 0.3 )

For Plus Face, I also set [start_at] to 0.4 to conserve the face position created by Face ID.

Some seeds, and checkpoints, may look better with different weights. These are just a starting point.
They are perhaps a little low, but I'm more interested in getting a good image, than to get perfect face likeness.

Here is an example, at full resolution, with Realism Engine 2, with FaceID v1.
In the first image, weights are the same, but in the bottom one, FaceID is at 0.8.
Here, a weight of 0.8 starts to affect image quality.
Notice how, in the bottom image, the face edge is too sharp in comparison with the face itself.

FaceID ( 0.6 ) + Plus Face

FaceID ( 0.8 ) + Plus Face -- resulting in worse image quality, in this case.

Answer 66 · 2024-01-18T18:09:37.000Z

@JorgeR81 I'm playing around with some of your values, what do you think of this result?

I'm seeing too many faces and I am going insane if the face looks similar as your input image, but to me it looks pretty alright:

Currently with only this added:

Answer 67 · 2024-01-18T19:37:50.000Z

@JorgeR81 I'm playing around with some of your values, what do you think of this result?

They look really nice. The second one is closer I think.

The necklace also looks very good.
You were able to get the diamonds in there ! Did you change the prompt?
I also placed "RAW" at the start of the prompt. Not sure if that will make much of a difference.

I did not try the 1024 x 1024 resolution, but it seems to work well. There is more detail.
I used 1152 x 768, which is a little below the recommended size for SDXL.

For the second Adapter I am using [ ip-adapter-plus_sdxl_vit-h.safetensors ], instead of the SD 1.5 one. That could bring some improvements.

In some seeds, you may get better results if you use the [prepare image por clip vision] node for both adapters.
#195 (comment)

Answer 68 · 2024-01-18T19:43:12.000Z

@JorgeR81 I'm playing around with some of your values, what do you think of this result?

They look really nice. The second one is closer I think.

The necklace also looks very good. You were able to get the diamonds in there ! Did you change the prompt? I also placed "RAW" at the start of the prompt. Not sure if that will make much of a difference.

I did not try the 1024 x 1024 resolution, but it seems to work well. There is more detail. I used 1152 x 768, which is a little below the recommended size for SDXL.

For the second Adapter I am using [ ip-adapter-plus_sdxl_vit-h.safetensors ], instead of the SD 1.5 one. That could bring some improvements.

In some seeds, you may get better results if you use the [prepare image por clip vision] node for both adapters. #195 (comment)

I actually didn't notice that I was using the SD 1.5 adapter, so I changed that to the ip-adapter-plus_sdxl_vit-h one. That also explains why I got the exact same image when I bypassed all the Plus-Face nodes. I just changed and got this image:

What do you think looks better?

Also for the prompt, I did not change anything, I copied the prompt from that Midjourney comparison video:

cinematic, photo, woman, cyberpunk, vermillion, anachronism, futuristic fragmentation, translucent, transcendence, transparent, layered composition, cyberpunk futurism, very light hair, braids, tattoos, implants, body covered with diamonds, shining jewellery, desaturated, muted light pink palette, ARRIFLEX 35 BL Camera, Canon K35 Prime Lenses, looking at camera

Edit: apparently all the results that I posted, are also without the LoRa. I just connected the LoRa properly and the images are not great :D

Answer 69 · 2024-01-18T21:14:45.000Z

What do you think looks better?

The skin looks a bit more natural on the new one.
But some weights may be too high for this configuration, because the hair above the forehead is not as well defined.

I just connected the LoRa properly and the images are not great :D

Are you having worse results with the lora ?

Answer 70 · 2024-01-18T21:46:05.000Z

@JorgeR81

Are you having worse results with the lora ?

Yeah for some reason the results are really bad with the LoRa. I’ll post some pictures soon.

I’m trying to tweak with the weights but whenever I think that the face looks somewhat alright; I try to upload a picture of myself or some other guy and then it’s a complete disaster.

I wish there were some magic values that work with everyone or something. Or atleast good consistent results for male and female..

It’s quite hard to get the perfect results unfortunately.

Answer 71 · 2024-01-18T22:30:14.000Z

Yeah for some reason the results are really bad with the LoRa.

The first version of the lora was broken. But then it was fixed, and reuploaded.
If you see an error message in your cmd line, when you run the prompt, you should download it again.
#210 (comment)
#210 (comment)

It’s quite hard to get the perfect results unfortunately.

The future version of FaceID may not need a lora at all, so it will be easier to use.
#195 (comment)

And I think cubiq is already working on another method that seems very promising.
#224

Answer 72 · 2024-01-18T22:34:37.000Z

@JorgeR81 That’s true. I’ll try to see if updating the LoRa fixes the issue.

Until then, I’ll try to play around with different weights that will result into perfect faces for male and female (both with the same values) so the flow becomes somewhat consistent. Since that’s what I need, ideally.

But the future is definitely looking promising 🚀

Answer 73 · 2024-01-19T07:46:08.000Z

@cubiq We release ip-adapter-faceid-portrait at https://huggingface.co/h94/IP-Adapter-FaceID

Answer 74 · 2024-01-19T08:40:34.000Z

@cubiq We release ip-adapter-faceid-portrait at https://huggingface.co/h94/IP-Adapter-FaceID

you wanted to compete with PhotoMaker ? 😄

Answer 75 · 2024-01-19T08:59:06.000Z

@cubiq We release ip-adapter-faceid-portrait at https://huggingface.co/h94/IP-Adapter-FaceID

you wanted to compete with PhotoMaker ? 😄

in fact, the model has been trained a while. It is limited to portrait generation and is sensitive to text prompt. But the advantage is high degree of freedom (supports text editing style)

maybe you can make a fair comparison with PhotoMaker and IPAdapter-FaceIDPlus?

Answer 76 · 2024-01-19T09:32:19.000Z

I will certainly do @xiaohu2015 thanks again for your great work

Answer 77 · 2024-01-19T09:59:46.000Z

has the structure changed @xiaohu2015 ?

        Missing key(s) in state_dict: "proj.weight", "proj.bias". 
        Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

I can see what is happening here

Answer 78 · 2024-01-19T10:19:36.000Z

okay now it just doesn't work with strict dict loading

Answer 79 · 2024-01-19T10:38:36.000Z

has the structure changed @xiaohu2015 ?

        Missing key(s) in state_dict: "proj.weight", "proj.bias". 
        Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

I can see what is happening here

it is same with faceid model but no lora, do you use wrong pipeline?

Answer 80 · 2024-01-19T10:40:41.000Z

it doesn't work with strict state_dict loading, without strict loading the model is loaded but the results are all wrong, so I guess I need to check if anything changed in the code

Answer 81 · 2024-01-19T10:40:44.000Z

Error(s) in loading state_dict for ImageProjModel:
Missing key(s) in state_dict: "proj.weight", "proj.bias".
Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".

Getting the same error

Another question: I'm trying to get multiple reference images to work but not quite sure how to do that. Since this flow just loads one image each run from that specified folder..

Answer 82 · 2024-01-19T10:45:13.000Z

it doesn't work with strict state_dict loading, without strict loading the model is loaded but the results are all wrong, so I guess I need to check if anything changed in the code

it use this project net: https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter/ip_adapter_faceid.py#L64

Answer 83 · 2024-01-19T11:31:13.000Z

Error(s) in loading state_dict for ImageProjModel:
Missing key(s) in state_dict: "proj.weight", "proj.bias".
Unexpected key(s) in state_dict: "proj.0.weight", "proj.0.bias", "proj.2.weight", "proj.2.bias".
Getting the same error

Another question: I'm trying to get multiple reference images to work but not quite sure how to do that. Since this flow just loads one image each run from that specified folder..

use mean ID embedding or concat multi embedding

Answer 84 · 2024-01-19T11:38:24.000Z

mygod the new face-id portrait is amazing

Answer 85 · 2024-01-19T14:24:37.000Z

portrait models are supported. resemblance is not great but they are super easy to style

Answer 86 · 2024-01-19T14:38:26.000Z

portrait models are supported. resemblance is not great but they are super easy to style

using multiple face embeddings is helpful to enhance
id similarity

Answer 87 · 2024-01-19T14:43:23.000Z

yes, those images are created with 5 reference images each

now we need a comparison with photomaker 😄

Answer 88 · 2024-01-19T17:41:15.000Z

yes, those images are created with 5 reference images each

~~How can I use multiple reference images with faceid? Do I apply it multiple times or is there another way?~~

Edit: I found the example

Answer 89 · 2024-01-20T11:41:57.000Z

I tried the default workflow, for the portrait version.
As inputs, I used 5 images generated with the same prompt and different seeds, to see what it does with that.

Image quality and face likeness is good for me, with 5 images.
The likeliness for the new image is not much worse than the likeliness between the 5 inputs, despite using a completely different prompt.

Having a single image still gives decent likeness, but worse quality.

Changing the prompt to "antique bronze statue" makes it cooler!

Set [start_at] = 0.60 and [weight] = 0.95, makes it look like an actual "bronze statue" ( let's say the sculptor encrusted glass eyes ).

Answer 90 · 2024-01-20T13:44:18.000Z

This upcoming model claims to generate better facial likeness with a single-face image...
https://github.com/InstantID/InstantID

Answer 91 · 2024-01-20T14:10:49.000Z

I tested TurboVisionXL and it performs very well given the right configuration.

The new data: https://docs.google.com/spreadsheets/d/1hjiGB-QnKRYXTS6zTAuacRUfYUodUAdL6vZWTG4HZyc/edit?usp=sharing

Answer 92 · 2024-01-20T14:36:14.000Z

@cubiq What are your expectations for the FaceID Plus v3 version?

Answer 93 · 2024-01-20T14:46:28.000Z

@cubiq What are your expectations for the FaceID Plus v3 version?

Personally I would like to get rid of insightface altogether and find something better.

That being said I would love to see a constant average under .30 (euclidean). I think I'm also hitting the limit of my methodology to check the embeds so I might need to try different models.

Also people really like a good result at first try, they don't want to add additional processes (like inpainting or a second pass to fix things or set an attention mask). So FaceID Portrait seems to be a good direction if you can keep the likeliness still high. Of course you need FaceID Portrait SDXL 😄

Answer 94 · 2024-01-20T14:48:28.000Z

Hi, @cubiq. If you plan to test more turbo models, try also this one:

https://civitai.com/models/224983/bestmixsdxlphotocinematurbov1

It's based on TurboVisionXL, and it works well with DPM++ 2M SDE / DPM++ 2M.

Most of the other turbo models need DPM++ SDE, which is about twice as slow for me, for each step.

Answer 95 · 2024-01-20T14:54:55.000Z

@cubiq What are your expectations for the FaceID Plus v3 version?

Personally I would like to get rid of insightface altogether and find something better.

That being said I would love to see a constant average under .30 (euclidean). I think I'm also hitting the limit of my methodology to check the embeds so I might need to try different models.

Also people really like a good result at first try, they don't want to add additional processes (like inpainting or a second pass to fix things or set an attention mask). So FaceID Portrait seems to be a good direction if you can keep the likeliness still high. Of course you need FaceID Portrait SDXL 😄

thank you. i also write some thing about face model.
tencent-ailab/IP-Adapter#266. i think current best way is full face + faceswap if you want to do clone. FaceID (or other methods like photomaker and instantID) just likeness but can't 100%

Answer 96 · 2024-01-20T14:58:33.000Z

thank you. i also write some thing about face model. tencent-ailab/IP-Adapter#266.

that's very interesting! Please keep documenting the models!

what should we expect for v3 ?

Answer 97 · 2024-01-20T15:06:20.000Z

thank you. i also write some thing about face model. tencent-ailab/IP-Adapter#266.

that's very interesting! Please keep documenting the models!

what should we expect for v3 ?

Currently there is no v3 version.

Answer 98 · 2024-01-20T18:23:35.000Z

FaceID-Portrait is a nice way to mix two faces together!

Answer 99 · 2024-01-20T22:59:20.000Z

FaceID-Portrait is a nice way to mix two faces together!

It would be nice to have a version of the FaceID node that allows setting weight per image, like with the regular IPAdapter.

Answer 100 · 2024-01-21T00:56:06.000Z

@JorgeR81 @cubiq for multiple images, it also can using a weight combine of id embeddings