feature: Community knowledge base

Question

feature: Community knowledge base

Closed this issue a year ago · 5 comments

suggestion: why should each client process every asset? we can gather knowledge to check for quick verification, and when a client sees a new image his device does the work and sends the result.
It can also be a DNS server. Or a distributed ledger ( blockchain ) as a source of truth to avoid centralization?
with this strategy, we can cluster the load to be able to classify content with high-accuracy models
What do you think?

Answer 1 · 2023-11-08T08:10:03.000Z

Salam!

I'm not the creator of this extension, but I am one of its users that is following its development from a detection/AI standpoint. So my understanding does not represent the views of the developer @alganzory

The sheer number of images on the internet is on the order of trillions. There is no way that such a large amount of image content can be stored to cache the detection results. New images would appear every day, people have different device screen sizes, there are thousands of frames in video, etc. There are many factors contributing to the huge number of images. This is not generally how machine-learning based image recognition works.

Furthermore, running a server that can store the detection results from each client so that it can be re-used later in other clients (or the same client), would also require additional resources (that would need to be payed for), which is outside of the extensions running on each client browser.

Answer 2 · 2023-11-08T12:53:25.000Z

Salam!

I'm not the creator of this extension, but I am one of its users that is following its development from a detection/AI standpoint. So my understanding does not represent the views of the developer @alganzory

The sheer number of images on the internet is on the order of trillions. There is no way that such a large amount of image content can be stored to cache the detection results. New images would appear every day, people have different device screen sizes, there are thousands of frames in video, etc. There are many factors contributing to the huge number of images. This is not generally how machine-learning based image recognition works.

Furthermore, running a server that can store the detection results from each client so that it can be re-used later in other clients (or the same client), would also require additional resources (that would need to be payed for), which is outside of the extensions running on each client browser.

@man2machine said it better than I would have
That being said, this made me think of enabling some sort of reporting for individual images that didn't get successfully detected, this bank of unsuccessful detections could then be used to train/retrain the models to be more accurate? For example, images of Sheiks or people wearing head covers usually get misclassified as women, if at any stage we train or retrain our own models I would surely want these to go into the training as I doubt that any of the famous models out there use these images for gender classification training or nsfw detection

Answer 3 · 2023-11-08T16:19:48.000Z

Yes, as @alganzory said, crowd sourcing detection results is definitely something that is possible from a technical standpoint. However it may be hard to get this to work since different people have different standards and there is potential for many labeling errors. But if such issues were overcome, then yes it is definitely possible. Right now I think the priority is to find and potentially create better detection models, and improve the user interface/experience.

…

On Wed, Nov 8, 2023, 4:53 AM Mohamed Alganzory ***@***.***> wrote: Salam! I'm not the creator of this extension, but I am one of its users that is following its development from a detection/AI standpoint. So my understanding does not represent the views of the developer @alganzory <https://github.com/alganzory> The sheer number of images on the internet is on the order of trillions. There is no way that such a large amount of image content can be stored to cache the detection results. New images would appear every day, people have different device screen sizes, there are thousands of frames in video, etc. There are many factors contributing to the huge number of images. This is not generally how machine-learning based image recognition works. Furthermore, running a server that can store the detection results from each client so that it can be re-used later in other clients (or the same client), would also require additional resources (that would need to be payed for), which is outside of the extensions running on each client browser. @man2machine <https://github.com/man2machine> said it better than I would have That being said, this made me think of enabling some sort of reporting for individual images that didn't get successfully detected, this bank of unsuccessful detections could then be used to train/retrain the models to be more accurate? For example, images of Sheiks or people wearing head covers usually get misclassified as women, if at any stage we train or retrain our own models I would surely want these to go into the training as I doubt that any of the famous models out there use these images for gender classification training or nsfw detection — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJVVQKH45NRYJ22EJP6JVCTYDN6FBAVCNFSM6AAAAAA7BZN526VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBRHAZTKMJYGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 4 · 2023-11-09T10:25:19.000Z

well i just wanted to share my thoughts. you guys knows better of course

Answer 5 · 2023-11-09T10:35:05.000Z

@marwenbk thanks for your suggestion, please keep sharing your thoughts and support <3