sigmf/SigMF

questioning the utility vs. hassle of the N/ directory thing

gmabey opened this issue · 5 comments

Hello,

I'm reading the spec, in the section "Rules for SigMF Archive files:" where it reads:

3. The Archive MUST contain the following files: for each contained Recording with some name
    given here meta-syntactically as N, files named N (a directory), N/N.sigmf-meta, and N/N.sigmf-data.
4. The Archive MAY contain a .sigmf-collection file in the top-level directory.

and I'm starting to question the N/ directory thing. It seems to me that the original intent may have been to "keep the family together" when un-taring an archive at the command line, but the Collection thing isn't consistent with this goal.

Now, don't get me wrong, I like collections, and I like keeping families together. However, I'm just now looking at using archivemount to give me access to datasets without copying them, and the extra directory seems silly in that context (the context of having all of the contents of an Archive accessed through a new base directory).

So, here's my "contribution question":

How is the N/ directory better than recommending a simple bash function/script that looks something like this:

unarchive_sigmf() {
    FILE_BASENAME=`basename $1`
    ARCHIVE_NAME=${FILE_BASENAME%.sigmf}
    mkdir -p $ARCHIVE_NAME
    tar --directory=$ARCHIVE_NAME -xf $1
    echo "SigMF Archive has been extracted to ./$ARCHIVE_NAME/"
    echo "and have a _great_day_."
}

Probably should add some error reporting to the above, but you understand the general approach, right??

I'm also starting to feel that the N/ directory thing being a "MUST" conflicts with "Collections rule 3": "The sigmf-collection file MUST be either in the same directory as the Recordings that it references, or in the top-level directory of an Archive (described in later section)." The contradiction comes in that, if the .sigmf-data and .sigmf-meta files MUST be in an N/ directory, how could a .sigmf-collection file possibly be simultaneously "in the same directory as the Recordings that it references" since each recording would have to be in its own N/ directory? Unless of course there's only one recording in a collection, which I guess I don't see as a requirement anywhere, but I kinda thought that a plurality of recordings was intrinsic to the concept of a collection.

Preemptive (and hypothetical) rebuttal responses:

  1. The N/N.sigmf-{meta,data} structure has no benefit to non-bash, windows users, so [I conclude] the original feature was only geared for less-savvy Linux/Mac users who might accidentally tar -x an Archive in a directory where it clobbers/confuses/clutters more than we think they should when using tar. (I might just write up a Windoze-compatible command line app that uses libarchive just to allow for a comparable operation on that platform, just to show 'em ... someday)
  2. For backwards compatibility, having a Recording with a "name" of N/N is fine, just obviously redundant. (why would you ...)

Here's that utility that I threatened to write: gmabey@b00567a

You have to download the win64-binary from https://www.libarchive.org and surely hafta tweak build.bat but other than that it should be good to go (also, move the .libs into that new directory).

Thanks @gmabey I agree there is a problem here. The N/N.sigmf-* structure was very much intentional.

@bhilburn I do not have a strong opinion on this. The gist is this:

the N/ directory thing being a "MUST" conflicts with "Collections rule 3": "The sigmf-collection file MUST be either in the same directory as the Recordings that it references, or in the top-level directory of an Archive (described in later section)."

We can:

  • remove the requirement for N/N.sigmf-[meta|data] tar format which has some problems with existing application
  • allow collections to specify recordings as N/N.sigmf-* as opposed to strictly within the same directory
  • allow/require collection archives to be of the structure
    • X/X.sigmf-collection
    • X/A.sigmf-meta (collection object)
    • X/A.sigmf-data
    • X/B.sigmf-meta (collection object)
    • X/B.sigmf-data

The first option is somewhat bad. I would lean toward the third I think.

Thanks @gmabey I agree there is a problem here. The N/N.sigmf-* structure was very much intentional.

I believe you when you say it was intentional. I'm just asking for rationalization, like since [some] things have changed. Surely, if it was intentional and is still a good idea, someone can explain to me what benefit the N/ directory has?

If I were a voting member, I would vote for the first option -- what is bad about it?

The first option is a breaking change for applications that make assumptions based on the specifications guarantee. So while this is a possible resolution it is "bad" in the sense that it will need to wait for the 2.0 release (we deliberately have a very slow release cadence). Specification contradictions are bugs and so option 1 is more or less untenable.

Deferring to Collections I think is the right move 👍🏼