/dedupliface

Deduplicate kobo submissions using face pictures.

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

dedupliface πŸ‘©πŸΏπŸ‘©πŸ½β€πŸ¦±πŸ‘³πŸ»

Deduplicate Kobo submissions using face pictures.

Note

Terms of Service: usage of dedupliface is permitted only

  • for humanitarian programs involving the registration of people,
  • to prevent duplicate registrations, whether caused by error or fraud,
  • when no proof of legal identity is held by people assisted,
  • when duplicates are validated by humanitarian workers, who ultimately decide if a person should (not) be included in a program,
  • in combination with KoboToolbox.

Collection of face pictures and their use in dedupliface must be done in accordance with the IFRC Data Protection Policy.

Usage

The high-level workflow is:

  1. Create a Kobo form with a question of type Photo, with which you collect face pictures.
  2. Connect the Kobo form with dedupliface using Kobo REST Services.
  3. When a new submission is uploaded to Kobo, an encrypted numerical representation of the face, a.k.a. an embedding, is saved in a dedicated vector database. The encryption key is unique to the Kobo form.
  4. Dedupliface checks which faces in the vector database are duplicate and stores the information in the Kobo database.
  5. Delete the encrypted embeddings from the vector database, for extra safety.

Connect Kobo to dedupliface:

  1. Define which question in the Kobo form is used to get face pictures.
  2. Define which question in the Kobo form is used to mark duplicates (can be hidden in the form itself).
  3. Register a new Kobo REST Service and give it a descriptive name.
  4. Insert as Endpoint URL
    https://dedupliface.azurewebsites.net/add-face
    
  5. Add under Custom HTTP Headers:
    • In Name add koboasset and in Value the ID of your Kobo form (asset)
    • In Name add kobotoken and in Value your Kobo API token (see how to get one)
    • In Name add kobofield and in Value the name of the question used for face pictures

Get duplicates:

  1. Upload all submissions to Kobo
  2. Make a POST request to
https://dedupliface.azurewebsites.net/find-duplicate-faces

through the Swagger UI or whatever tool you prefer.

  • Specify koboasset and kobotoken in the headers, as before
  • Specify kobofield and kobovalue in the request body, where kobofield is the name of the question used for marking duplicates and kobovalue is the value that marks a duplicate (e.g. yes)
  1. Your duplicate submissions will now be marked as such in KoboToolbox.

Technical Specifications

Synopsis: a dockerized python API that checks if face pictures in Kobo are duplicate.

Based on FastAPI and facenet-pytorch. Stores and queries face embeddings with a dedicate vector database, Azure AI Search. Uses Poetry for dependency management.

Encrypts face embeddings with two keys, one global and one unique to each Kobo form.

Run locally

Create the .env file for local environment variables

cp example.env .env

and edit them accordingly.

Then, with Uvicorn:

poetry install
uvicorn main:app --reload

or with Docker:

docker compose up --detach

Deploy in Azure

  1. Create an App Service Plan Premium v3 P2V3 or above.
  2. Create an App Service Web App with the following settings:
    • Publish: Docker Container
    • Operating System: Linux
    • Region: the same as the App Service Plan