
Cross the streams

Apache License 2.0Apache-2.0


Cross the streams.


Serverless deployment of a video processing pipeline for capturing snapshots/clips from N concurrent video streams based on an event within a single stream.

Use Cases

  • Capture multiple angles when an intruder enters the doorway
  • Capture emotion and target of their des(ire)
  • Capture scene from multiple focal lengths
  • Capture non-repeatable events from multiple angles, e.g. stunts


Architecture Diagram


Data sources

Two or more cameras that can be connected to the internet.


  • Camera: Properly streaming h.264 from multiple cameras in high quality, e.g. GoPros
  • Coordination: The accuracy of frame capture between multiple video streams is paramount. If the snapshots are off by too much of a delta, even a second, they won't line up.
  • Scaling: The combination of camera framerate and pipeline sampling speed will push the number of parallel lambda functions
  • Cost: Doing this "on the cheap". We aren't going to sample and index every frame.


Before we break this down into MVP and beyond, let's look at what the end product looks like.

  • Streaming
    • Setup local environment for streaming video from cameras into Kinesis Video
  • Extraction
    • Extract frames from video streams as they come in
    • Using API + Lambda or Rekognition processor as input
    • Write frame to S3 and frame metadata to DynamoDB
  • Event Detection
    • Trigger off S3 PUT or DynamoDB stream
    • Detect event based on some configurable criteria (emotion, specific person, object in scene)
    • If no event, delete frame in S3 and frame metadata in DynamoDB
  • Frame Extraction
    • Trigger off result of Event Detection; run one for each video stream
    • Pull frame from S3 and frame metadata from DynamoDB
    • Pull corresponding frame from other video streams
    • Write frames to S3 and metadata to DynamoDB
  • Frame Joiner
    • Trigger (one) off result of Frame Extraction
    • Pull all frames from S3 and frame metadata from DynamoDB
    • Merge frames into single image (PnP)
    • Write frame to S3 and metadata to DynamoDB
  • UX
    • Site that allows browsing of joined images


  • Two web cameras
  • Event detection is simple time interval, say every 5 seconds
  • No Frame Joiner
  • UX displays all raw frames


  • Additional cameras, e.g. Go Pro
  • Event detection (emotion, object in scene, etc.)
  • Clip Extraction (defined duration)
  • Frame Joiner merges frames from all snapshots
  • UX is browser for all images e.g. carousel