fpgaminer/joytag

[Discussion] Comparison with Danbooru interrogator in SD Automatic1111

Closed this issue · 3 comments

Hello, thank you for sharing this model.

I did a quick and naive check between this and the Danbooru Interrogator in Automatic1111's webui and compared with the actual tags. The test took the 100 most recent posts from Danbooru with a success rate of 88/88. (12 images didn't have proper url to download).

These are my current observations:

  • JoyTag's model has a much higher similarity rate (true positive) to the actual tags.
  • JoyTag's model has a much lower incorrect tag prediction rate (false positive) compared to the SD Interrogator.
  • Both JoyTag and Interrogator model miss tags, with JoyTag missing less (false negative).

I'm looking forward to see if this can be integrated into the webui or retrained for even more tags!

Note:

  • I didn't do any in-depth study on the Interrogator before.
  • True Negative is 0 because there's nothing sensible to check with.
  • This uses the actual tags as the ground truth, although there could always be mistakes/missing tags/uneven tag distribution.

Settings: Threshold 0.5

chart-100im-0 5thresh-v2


I just noticed that the threshold on the doc was 0.4. Here, I ran the same code but with the new threshold. The images pulled may not be the same. Success rate 88/88 (12 failed to download).

Observation:

  • It has a much accurate prediction (true positive) score, however, it hallucinates more (false positive).

chart-100im-0 4thresh-v2

Edit: Updated charts to reflect fixed Interrogator code. Cleaning tags was necessary.

O-J1 commented

Just on the the topic of accuracy and such. We've run into danbooru tagging accuracy issues. Whilst it might be good, its not great. Also, given the ratio of anime to realistic that this model is using more realistic tagging in DB style will likely be needed, either way this is a positive step. 👍

I did a quick and naive check between this and the Danbooru Interrogator in Automatic1111's webui and compared with the actual tags

Wow, thank you! Independent validation is incredibly helpful.

Is Danbooru Interrogator still using the old DeepDanbooru model? There are much, much better ones now like SmilingWolf's work, which probably performs better on anime images than this model currently anyway. I'd be happy for this model to pass that watermark some day, but the main focus of the JoyTag model has been on expanding into real life images for now.

Is Danbooru Interrogator still using the old DeepDanbooru model?

I think so!

There are much, much better ones now like SmilingWolf's work, which probably performs better on anime images than this model currently anyway.

I noticed that repo hasn't updated in a while, so I didn't realize it was better. Although, the one better seem to be an ensemble?

I'd be happy for this model to pass that watermark some day, but the main focus of the JoyTag model has been on expanding into real life images for now.

I understand that completely. This was just a curious experiment to see how the model performed :)