Test solution for face matching of descriptors generated by Human
library
(in reality any other descriptor can also be used as long as descriptors in database and generated descriptor are of the same length)
All implementations are independent of the original library
(e.g., does not require human
and assumes descriptors are already generated)
Implements native match loop and similarity algorithm as they are implemented in Human
library
Run demo using src/js.js
Uses WASM module to run similarity & match methods:
- Written in AssemblyScript
- Uses
f32
instead of defaultf64
NativeMathF
stdlib is faster thanMathJS
import lib by 25%, butMathJS
based solution is smaller- Built using shared memory, could be used instead of building the array inside WASM instance
- built using multi-threading, instead of having NodeJS worker thread pool, we could have WASM threading
- could be further optimized to use
SIMD
128bit operations instead off32
thus reducing similarity loop by 4x - WASM source is in
assembly/human-match.ts
How to build WASM binary:
node_modules/.bin/asc assembly/human-match.ts \
--outFile dist/human-match.wasm \
--textFile dist/human-match.wat \
--enable threads,simd \
--importMemory \
--optimizeLevel 3 \
--shrinkLevel 0 \
--sharedMemory \
--initialMemory 1 \
--maximumMemory 16384 \
--sourceMap \
--exportRuntime \
--transform as-bind
Run demo using src/wasm.js
:
- just two JS files:
multithread.js
andmultithread-worker.js
:) - no external dependencies for main process or worker threads
- manually created thread pool
can shutdown workers or create additional worker threads on-the-fly safe against workers that exit - shared buffer array that holds descriptors
- labels are maintained only in main thread
- job assigment to workers using round-robin
since timing for each job is near-fixed and predictable - memory consumption of buffer is 8k per descriptor since each descriptor element is a float64
this could be reduced by factor of 32x if necessary:- map
f64
element values touint8
for 8x size reduction
this would not result in performance gain as math still has to bef64
- reduced number of elements from 1024 to 256 for 4x reduction
this would result in equal performance gain when performing matches,
but reduction of descriptor complexity is math heavy to prepare such data
- map
- thread safe even without atomics or locks:
- buffer is preallocated
- only writing to incrementing addresses
- each write is a single f64 write without structures or serialization
- workers never access new address space until adding is complete
- once descriptor is added all workers in a pool are informed of the new record count
appendRecord
: add additional batch of descriptors to buffer on-the-flygetLabel
: fetch label for resolved descriptor indexgetDescriptor
: get descriptor array for a given id from a bufferworkersStart
: start or expand worker poolworkersClose
: close workers in a pool (nicely plus terminate)match
: dispach a match job to a worker
Tested with face database of 50k records and 100 match jobs:
threadPoolSize: 1 => 46,000 ms
threadPoolSize: 6 => 13,327 ms
threadPoolSize:12 => 10,150 ms
Note: This is a worse-case scenario where each match job scans entire database
Setting minThreshold
to even a high value typically improves results by 2-5x
multithread.js
workflow:
- preallocates buffer
- loads small descriptors database repeatedly to create fake large database
- creates couple of workers
- submits first batch of jobs based on random descriptors pulled from same database
- loads additional records
- creates additional workers
- creates fuzzed descriptors pulled from same database for harder match
- submits second batch of jobs
- closes workers when all jobs have completed
- Descriptor generated by
Human
can be reduced in dimensionality by 2x-32x
(e.g. from 1024-element array all the way to 32-element array)
seematch.js:reduce
method currently uses pca, but can use different methods:
https://en.wikipedia.org/wiki/Dimensionality_reduction
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction - High the reduction, higher the precission loss
match
result is similar, but individualsimilarity
resutls are very different - Reduction is CPU intensive and should be done on original face db insert
- Descriptor can also be normalized and stored with minimum loss of detail as
uint8
array
and reconstructed tof32
array on load thus resulting in size savings is 4x - Combining uint8 and reduced dimensionality would allow processing of db with >10M records in a single pass