GeniSys AI is an open source Artificial Intelligence Assistant Network using Computer Vision, Natural Linguistics and the Internet of Things. GeniSys uses a system based on TASS A.I for vision, an NLU engine for natural language understanding, in browser speech synthesis and speech recognition for speech and hearing, all homed on a dedicated Linux server in your home and managed via a secure UI.
The GeniSys TASS Devices & Applications uses Siamese Neural Networks and Triplet Loss to classify known and unknown faces, basically this calculates the distance between an image it is presented and a folder of images of known faces. The local engine is homed on your GeniSys server and connects to your server webcam, this is one of the reasons why the UI is designed to be used on the local network, as if you connect from any where other than in front of the local camera then GeniSys will not be able to see you.
As mentioned above, GeniSys TASS Devices & Applications uses Siamese Neural Networks & Triplet Loss, but what exactly are they?
Siamese Neural Networks basically are made up of 2 neural networks that are exactly the same, hence the name Siamese Neural Networks. Given two images, we use the identical networks to calculate the similarity between the two images. Basically Siamese Neural Networks learn how to differentiate between objects, or in this case, faces.
Triplet Loss reduces the difference between an anchor (an image) and a positive sample from the same class, and increases the difference between the ancher and a negative sample from an opposite class. Basically this means that 2 images with the same class (in this case, the same person) will have a smaller distance than two images from different classes (or 2 different people).
MORE INFO: Lossless Triplet loss, A more efficient loss function for Siamese NN
Taken from Walter J. Scheirer's website:
"Both recognition and classification are common terms in computer vision. What is the difference? In classification, one assumes there is a given set of classes between which we must discriminate. For recognition, we assume there are some classes we can recognize in a much larger space of things we do not recognize. A motivating question for our work here is: What is the general object recognition problem? This question, of course, is a central theme in vision. How one should approach multi-class recognition is still an open issue. Should it be performed as a series of binary classifications, or by detection, where a search is performed for each of the possible classes? What happens when some classes are ill-sampled, not sampled at all or undefined?
The general term recognition suggests that the representation can handle different patterns often defined by discriminating features. It also suggests that the patterns to be recognized will be in general settings, visually mixed with many classes. For some problems, however, we do not need, and often cannot have, knowledge of the entire set of possible classes. For instance, in a recognition application for biologists, a single species of fish might be of interest. However, the classifier must consider the set of all other possible objects in relevant settings as potential negatives. Similarly, verification problems for security-oriented face matching constrain the target of interest to a single claimed identity, while considering the set of all other possible people as potential impostors. In addressing general object recognition, there is a finite set of known objects in myriad unknown objects, combinations and configurations - labeling something new, novel or unknown should always be a valid outcome. This leads to what is sometimes called "open set" recognition, in comparison to systems that make closed world assumptions or use "closed set" evaluation."
With previous versions of TASS built using Tensorflow, TASS Movidius Inception V3 Classifier, the model had issues with the Openset Recognition Issue. TASS Facenet Classifier uses a directory of known images and when presented with a new image, will loop through each image basically measuring the distance between the known image and the presented image, it seems to overcome the issue so far in small testing environments of one or more people. In a large scenario this method will not be scalable, but is fine for small home projects etc.
Combining TASS Movidius Inception V3 Classifier (prone to open set recognition issues) and TASS Facenet Classifier allows us to catch false positives and verify positive classifications using the name/ID of that prediction to quickly index into the images and make a single calculation to determine if Inception classified the person correctly or not using Facenet and making the project more scalable. The latest Inception version of the classifier will be uploaded to this repository soon.
There are currently three different versions of GeniSys TASS devices: local, Foscam and Realsense (Foscam and Realsense are in development). Each device connects to a stream of one kind or another, processes the frames and streams the modified frames.
- Uses code from Intel® movidius/ncsdk (movidius/ncsdk Github)
- Uses code from Intel® davidsandberg/facenet (davidsandberg/facenet Github)
Links to related articles that helped at various stages of the project for research / code examples:
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to me.
I use SemVer for versioning. For the versions available, see GeniSysAI/Vision/releases.
This project is licensed under the MIT License - see the LICENSE file for details.
I use the repo issues to track bugs and general requests related to using this project.