sigmf/SigMF

[FIX] `global` property of SigMF Metadata is a Python keyword

gregparkes opened this issue · 4 comments

Problem

The schema states that (https://sigmf.org/index.html) under the "SigMF Metadata Format" heading:

Metadata field names in the top level global Object, captures segment Objects, or annotations Objects MUST be of this form. All fields other than those at the top level which contain a : delimiter SHALL only use letters, numbers, and the _ character; all other characters are forbidden. Field names MUST NOT start with a number and MUST NOT not be C++20 or Python 3.10 keywords.

The Global object itself of the SigMF metadata standard defies this rule, and prevents any non-string Python implementation of the Global object (e.g Pydantic) without using aliasing.

Example

from pydantic import BaseModel

class SigMFMetaFile(BaseModel):
    global: SigMFGlobal # cannot define, global is a Python keyword
    captures: list[SigMFCapture]
    annotations: list[SigMFAnnotation]

You can use aliasing to avoid this, but then you have to define the object as something like:

from typing import Annotated
from pydantic import Field

class SigMFMetaFile(BaseModel):
    global_: Annotated[SigMFGlobal, Field(alias="global")] # avoids problem, but is ugly.

# to use, reference the global_
sample_rate = handle.global_.sample_rate

Solution

I would propose to modify the schema to rename global -> global_info.

This follows existing naming practices (e.g within the sigmf-python package):

  1. SigMFFile.get_global_info().

Wouldn't that break literally every SigMF recording that exists? SigMF has been around since 2017, it's one thing to change the python library API but it's too late for such major changes to the standard itself. Even something relatively minor like moving geolocation from global to captures is a big deal and requires a decent amount of discussion and time to make it happen.

That's a very valid point, and I wouldn't be upset if it wasn't worth the hassle to change at this point. And feel free to close the issue if so.

However, if one did want to go about changing it; as the standard is properly versioned, it could be implemented in a phased deprecation cycle.

The SigMF standard (on paper) would change, but in practice implementations of the standard (e.g sigmf-python) would support "legacy" standard definitions until the next major release (v2.0), with deprecation warnings appearing until then.

This would mean in practice that:

  • "global" and "global_info" keywords both validate to a correct global metadata object from a JSON file. If "global" is used a deprecation warning appears informing users that from v2.0 "global" will no longer be valid.
  • The object serializes to "global_info" instead of "global" when writing.
  • Support for "global" validation drops at SigMF v2.0

One could even go a step further and facilitate automatic changes to SigMF recordings as part of a "migration code" file, or for example, writing changes to any SigMF file which undergoes validation between now and v2.0.

I appreciate the effort here but I don't see us changing the global keyword, even in version 2.x or beyond.

I think the rationale behind the schema you quoted was because we wanted to allow variable names in code to be equal to keywords generally.

Agreed, probably not worth the effort. Happy to close.