sigmf/SigMF

Proposal to Add "sample_length" Property to SIGMF Core Specifications

Nepomuceno opened this issue · 2 comments

This proposal suggests the addition of a new property called "core:sample_length" to the core specifications of SIGMF. The purpose of this property is to provide information about the duration or length of a recording referenced by a metadata file. The inclusion of this property would simplify the evaluation process for users interested in a particular recording, enabling them to quickly assess its duration and overall file size. Additionally, when combined with the existing "core:datatype" property, it can offer insights into the size and other characteristics of the recording.

Motivation

The current SIGMF specifications lack a standardized property to represent the length of a recording. This omission can lead to challenges for users who wish to ascertain the duration of a particular recording and estimate its file size. By introducing the "core:sample_length" property, we can enhance the usability and comprehensiveness of SIGMF metadata files.

Proposed Changes

  1. Introduce a new property in the core specifications named "core:sample_length."
  2. The "core:sample_length" property should provide the duration or length of the recording in the number of IQ records.
  3. The property value should be expressed as an integer, allowing for large numbers to represent long recordings.
  4. The "core:sample_length" property should be defined as an optional field within the "global" section of the metadata file to provide a centralized location for this information.

Example

To illustrate the proposed changes, consider the following example of a SIGMF metadata file:

{
    "global": {
        "core:author": "Kyle Logue, K6OF",
        "core:datatype": "ri16_le",
        "core:description": "The Official SigMF Logo",
        "core:license": "https://creativecommons.org/licenses/by-sa/4.0/",
        "core:num_channels": 2,
        "core:recorder": "OsciStudio & Audacity",
        "core:sample_rate": 48000,
        "core:sha512": "69893900f22de266485031b584c28fc3a0d4f361acd1d623698ed258e616e082d3d398af40d2ce805a804864cb0be631dba060f7410a27c0c2e497becdca53bf",
        "core:version": "1.0.0",
        "core:sample_length": 1000000000
    },
    "captures": [
        {
            "core:datetime": "2021-06-18T23:17:51.163959Z",
            "core:sample_start": 0
        }
    ],
    "annotations": [
        {
            "core:comment": "logo warmup",
            "core:freq_lower_edge": -22000.0,
            "core:freq_upper_edge": 22000.0,
            "core:sample_count": 42000,
            "core:sample_start": 6000
        }
    ]
}

In this example, the "core:sample_length" property within the "global" section indicates that the referenced recording has a length of 1,000,000,000 IQ records. By including this property, users can easily determine the length of the recording and make informed decisions based on that information.

Benefits

  1. Improved Evaluation: Users can quickly evaluate the duration of a recording.
  2. File Size Estimation: The "core:sample_length" property, when combined with the existing "core:datatype" property, enable you to calculate the size of the recording file.

Conclusion

The addition of the "core:sample_length" property within the "global" section of SIGMF metadata files will enhance the usability and completeness of the metadata. This proposal aims to streamline the process of evaluating recordings, estimate file sizes, and provide valuable insights into the characteristics of the recording.

Kind Regards

Hi @Nepomuceno - the decision to manage SigMF sample counts by data file length (and associated header offsets) only has been somewhat fundamental to the overall effort. Because data and meta files are required to exist together, and file length evaluation is trivial, this is likely to continue to be the one (there will only ever be one - we have taken a lot of care to make meta specification less error prone) method of ascertaining the volume of data in a dataset.

This actually existed at one time, but was removed for many reasons. There is a bit of discussion on this in #79 and #97, though the bulk of decision making occurred at an in person working group.

Its unlikely that a data length field will ever be included in the core namespace, or any canonical namespace, that said we definitely will not restrict a users ability to add this to a custom extension namespace, which would be my suggestion to you here if dataset file length poses a problem.

No problem I did add the property to the traceability extension that I did put a purpose in for that too as @777arc suggested could be a good place