VisionQL

VisionQL is a project to explore the use of declarative queries on top of ML based computer vision. Think SQL for computer vision.

It is a Node.js application written in TypeScript. There is currently not a webserver, Node.js is a thin wrappers around the Google Vision api calls.

TypeScript make is easier and safer to write query against the response.

The goal is that VisionQL should have several backends. Next backend is going to be TensforFlow.js. Each backend will return a result in json. VisionQL should have TypeScript definitions for all of the backends. This will make it easier to query an ensamble of computer vision models.

How To Run

Currently VisionQL can do 2 Google Vision API calls. For both you need to have a file with service account credentials.

git clone git@github.com:sami-badawi/visionql.git

cd visionql

export GOOGLE_APPLICATION_CREDENTIALS=/home/yourname/yourpath/key.json 

npm i

npm run build

node dist/call_face_detect.js --gs_path gs://sami-vision-project/AI-panel-2018-02-15.jpg --query yes

node dist/call_image_label.js --file_path resources/wakeupcat.jpg --query yes

Face Detect Results

Result of running call_face_detect will be store in file:

output/AI-panel-2018-02-15_jpg/face_detect_result.json

If the face detect program was run with --query yes it will count number of faces and number of happy faces:

for image: gs://sami-vision-project/AI-panel-2018-02-15.jpg: faceCount: 4; happyFaceCount: 1

The project also has an example output file:

output/example_face_detect_result.json

Label Results

Result of running call_image_label will be store in file:

output/wakeupcat_jpg/label_detect_result.json

If the label program was run with --query yes it will output if the picture has cats, dogs and internet memes.

for image: ./resources/wakeupcat.jpg: isMeme: true, hasCat: true, hasDog: false

Current Queries Technology

Currently the application has a few canned queries.

TypeScripts types makes in easy and safe write these queries.

Example Happy Face Count Query

  public happyFaceCount(): number {
    const firstResult = this.apiVisionResponseArray[0];
    return firstResult.faceAnnotations.filter( (face) => face.joyLikelihood === "VERY_LIKELY").length;
  }

Example Cat Query

  public hasCat(): boolean {
    const firstResult = this.apiVisionResponseArray[0];
    return 0 <= firstResult.labelAnnotations.findIndex((label) => label.description === "Cat");
  }

Backends

There are many good computer vision system available.

Google Vision API

First backend for VisionQL is Google Vision API.

It is high quality. You have to be a user of Google Cloud Platform, but it is relatively easy and cheap to get setup to experiment.

TensorFlow.js

TensorFlow.js will be the next backend. The model can run in the browser or on Node.js. There is no need for GCP API keys.

TensorFlow.js models

TensorFlow.js has the following 2 models:

PoseNet for human pose estimation.
BodyPix for person segmentation.

Next Queries Technology

The point of VisionQL is that it should work with a more declarative queries. Here is a short discussion of a few candidates for this:

SQL does not lend itself well to this, since it is dealing with flat relational data.

PostGIS is a SQL frontend to a lot of computational geometry code written in C++. It is well suited for dealing with geometric operations on point, lines and polygons. However it is missing hierarchical nature of computer vision.

Mini Kanaren is a logic programming language with several implementation in JavaScritp. That is an option that is worth exploring.

Status

VisionQL is in pre alpha. It is currently a playground for experimenting with Google Vision API results in TypeScript, but it is pretty easy to setup and work with.