
Bamboo DVR

Primary LanguagePythonGNU General Public License v2.0GPL-2.0


Bamboo is a python module for building production-quality research pipelines for processing video and stillframe images.


  • Visually compare the results of face-detection or object-classification algorithms.
  • Do the same comparision methodologically over a million images, storing the results in a structured database.
  • Use the same pipeline to process from a disk directory, a video file, an RTSP camera stream, or JPEGs on-demand as they are loaded into an Amazon S3 bucket.
  • Debug pipelines on your laptop in an easy-to-use single-threaded environment, then deploy them on a laptop or workstation with a multi-thread server, on a cluster of high-performance servers, or in a serverless cloud environment using Amazon Lambda.
  • Provides a consistent, easy-to-program object-oriented abstraction layer that easily wraps existing computer vision systems including OpenCV, YOLOv8 and Amazon Rekognition.

Demo programs provided with Bamboo


scale - the amount to expand a face area. scale=1.0 is default (no expansion)


Process and archive surveillance video to answer useful questions such as:

  • Who was present on which days?
  • How many people did we see on a day?
  • Which vehicles entered our garage?
  • When were people in the office?

Take data from a variety of streams, including:

  • Google Nest cameras (captured from the Google Nest cloud using the API)
  • Any on-prem camera (captured using either an RTSP stream or a sequence of JPEGs)
  • Uploaded video
  • Cell phones repurposed as surveillance cameras
  • (ESP32-cam)[https://google.com/search?q=ESP32-CAM]

Plug-in architecture:

  • It's clear that we will always want to be able to have a plug-in interface and be able to support multiple plugins at each step of the pipeline.
    • We can have the plugins union, intersection, or vote.
    • With two plugins, we can compare them against each other (for running experiments.)

Processing options:

  • Single-threaded on local machine for debugging
  • Multi-threaded on local machine for performance
  • Lambda or GCF of Azure Functions

Enabling technologies we require (and what we are thinking of using)

  • Video change detection
  • Object detection in a video
  • Face recognition:
  • Structured database
    • Stores the result of the tagged video
  • Video storage
    • Can store frames or compressed video. Frames are higher quality; compressed video stores more. (Video is compressed as a series of I & D frames)


Initially we will prototype a number of small scripts to get an ideas of how this stuff works.


https://meraki.cisco.com/lib/pdf/meraki_datasheet_mv_sense.pdf https://documentation.meraki.com/General_Administration/Other_Topics/Cisco_Meraki_Dashboard_API


  • Iterate through all of the jpegs that have been captured in chronological order.
  • When a JPEG has significantly changed, copy it to the image store (local or s3) and run it through image processing.
  • Store the results of the image processing in a scalable store as a JSON object.
    • Store results by recognizer, so we can use several of them.


JPEGs: We're storing individual JPEGs in a directory hiearchy that is optimized to have 1000-5000 images per prefix (directory).

We anticipate that we'll have ~ 10-500 images per camera per day (local time or GMT is a config variables):


faces.py - show all faces on a given day



Identifying which frames to process:

  1. Videos are chopped into frames. (Pretty standard; ffmpeg can do this.)
  2. Each frame that is the first in a sequence or significnatly different from the previous frame is tagged for processing.
  3. Optionally we will tag a window around the changed frames for tagging processing as well.
  4. Videos and frames are expunged after a retention time.

Current implementation:

  • (ingest.py)[./ingest.py]


Processing frames:

  1. Each frame is represented by an object.
  2. Any number of processors can review a frame. We would likely have taggers for:
    • Faces (input: frame; output: face regions)
    • Objects (input: frame; output: objects)
    • Face vectors (input: face regions; output: face vectors)
    • Identities (input: face vectors; output: identities)
  3. Pipeline is constructed as a series of producer/consumer queues.
    • Makes it easy to support multi-threaded and multi-processing environments.
    • Ensemble processors are processors that have a single input, run multiple sub-processors, and have a single output.
      • An ensemble processor automatically stores results that can be used for producing experimental reports.

Technology Stack

Face clustering:

Do we want to have an abstract pipeline object?

  • Input and output
  • Annotation
  • Easily do experiments with an object that specifies multiple other objects.
  • Connect them together with YAML
  • Designed for running in a functions-as-a-service
  • Designed for storage with an object store like S3.

To check out

See Also

  • https://universe.roboflow.com/ - "The world's largest collection of open source computer vision datasets and APIs." (Unfortunately, no consistent API).