CamHD Motion Metadata

DOI for all versions of this dataset:

Please see our Zenodo record for citation information and for DOIs associated with specific releases of the data.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Introduction

See the Metadata Status for information on the current state of the metadata.

Please use the Github issue tracker to flag missing files, data quality issues, etc.

CamHD, an HD camera installed at 1500m water depth at Axial Seamount, generates ~13-minute HD videos of an active hydrothermal vent ecosystem, eight times a day. These files are stored in the Ocean Observatories Initiative raw data repository.

Under the NSF OTIC-sponsored program Cloud-Capable Tools for CamHD Data Analysis, we are investigating the use of video analytics / machine vision to generate ancillary metadata about each video: camera motion and position, and identification of sections (sequences of frames, time bounds) within each video when the camera is still, and looking and particular known "stations" on the vent.

This repo is the primary distribution point for that metadata. Git lets us version files as they are created, flag and track data quality issues, etc.

For additional information on this project, please see the project blog

Data in JSON format

The "canonical" data format is a set of one or more JSON files for each video in the CI.

The directory structure within this repository mirrors that of the raw data archive. Since we only analyze one instrument, all of the metadata files are under the directory RS03ASHS/PN03B/06-CAMHDA301/. The metadata files share a common root name with video files, followed by a suffix which describes the metadata (described in greater detail below). All metadata is stored in JSON-encoded text files, and all files use the .json extension.

All JSON files contain some common fields described here. At present, there are two kinds of data files in the repo:

*_optical_flow.json files contain the estimated camera motion for a subset of frames in in each video. The format is described here.
The optical flow files are then processed to isolate sequences where the camera motion is consistent (e.g. tilting upward, zooming in, static). We are particularly interested in static segments and attempt to label them by comparison to a set of ground truth video sequences.

These "regions" of consistent behavior are described in a *_optical_flow_regions.json file described here.

Right now, the JSON file formats are unstable. The file format allows for semantic versioning of the file contents, and we describe format changes in the Change Log.

Data in CSV format

The scripts/regions_to_csv.py script converts the JSON regions files to a tabular CSV format.

We provide a Frictionless Data datapackage.json file, so their extensive library of datapackage tools can be used to access the region data:

import datapackage

url = "https://raw.githubusercontent.com/CamHD-Analysis/CamHD_motion_metadata/master/datapackage/datapackage.json"

dp = datapackage.DataPackage(url)

print(dp.descriptor['title'])

The datapackage/scripts/ directory contains Python scripts specific to the datapackage format.

The datapackage/examples/ directory contains more examples written using the datapackage.

Data in Google Bigquery

As an experiment, the CSV version has been uploaded to Google Bigquery.
This db is publicly readable and is available here

The db can be accessed using the tools/libraries provided by Google. For example, using the bq command line tool:

 >  bq query --project_id camhd-motion-metadata "SELECT mov_basename,start_frame,end_frame FROM camhd.regions WHERE scene_tag='d2_p0_z0' ORDER BY date_time LIMIT 6"

 Waiting on bqjob_r542e368cb71317b8_0000015d806335c0_1 ... (0s) Current status: DONE
 +----------------------------+-------------+-----------+
 |        mov_basename        | start_frame | end_frame |
 +----------------------------+-------------+-----------+
 | CAMHDA301-20160101T000000Z |       23601 |     23931 |
 | CAMHDA301-20160101T000000Z |       13101 |     13381 |
 | CAMHDA301-20160101T000000Z |        1711 |      2191 |
 | CAMHDA301-20160101T000000Z |        4691 |      4961 |
 | CAMHDA301-20160101T000000Z |        7531 |      7811 |
 | CAMHDA301-20160101T000000Z |       21031 |     21351 |
 +----------------------------+-------------+-----------+

Preparation

The metadata files are generated using a couple of different software tools:

CamHD-Analysis/camhd_motion_analysis uses C++ and Python files which perform the optical flow calculation.
CamHD-Analysis/camhd-motion-analysis-deploy contains scripts and documentation on running camhd_motion_analysis on a Docker swarm.

The Python tools in the scripts/ directory are used for further manipulation of the optical flow files:

make_regions_files.py takes the optical flow files as input and:
1. Breaks the video into time frames with consistent camera behavior (camera static, zooming in, panning left, etc.).
2. Uses a set of hand-labelled ground truth files to attempt to label each static section with its corresponding region.

See docs/MakeRegionsFile.md for more detail.

scripts/make_csv.py converts the JSON data to tabular format. make csv in the top-level Makefile will refresh the CSV with all existing region files.

scripts/ci_meta_scrape.py scrapes movie meta-information from the CI into JSON files. scripts/ci_scrape_to_csv.py converts these JSON files to CSV.

Todos

Add installation instructions and pointer to demos

inferee/CamHD_motion_metadata