VRM gets treated as glTF instead of GLB

Describe the bug

Applying e.g. draco compression to a .vrm file results in all resouces being externalized. The VRM is then saved as a glTF json (with the vrm extension)

According to https://github.com/vrm-c/vrm-specification/blob/master/specification/0.0/README.md however the VRM is a GLB format

To Reproduce
Steps to reproduce the behavior:

Download zip
Run gltf-transform draco "AvatarSample_A.vrm" "AvatarSample_A.vrm"
See unpacked png textures

Expected behavior
VRM gets saved as GLB

Versions:

Version: 3.10.1
Environment: node20

Additional context

AvatarSample_A.zip

Before

After

I'm going to retag this as a feature, the library currently knows nothing about .vrm or other third-party variants of .gltf/.glb. But I think I agree that this should be made to 'just work' for the .vrm file extension (other than not understanding the VRM extensions embedded in the file).

I suspect it's mostly a change in the CLI, for script-based usage, io.writeBinary works.

#1137

Also, probably the library should log a warning when it's asked to write a file extension it doesn't know the format of (json vs. binary). Perhaps with some way to register new extensions and their format. Like:

import { Format } from '@gltf-transform/core';

io.registerFileSuffix('vrm', Format.GLB);

I think another option could be if the check "is this a binary or a text file" is based on the GLB magic header instead of relying on file extensions (which could possibly not exist at all for entirely different reasons).

True – that'd work for the CLI, though not in the script-based I/O case, which may not read from disk. Not sure if something like ...

gltf-transform cp blob_no_extension other_blob

... is worth trying to guarantee same binary vs. json output, I think I'd prefer to log a warning that the library doesn't know what you mean.

which may not read from disk

Maybe I'm misunderstanding what you mean – the magic header for GLB ("glTF") is there no matter if it's just a blob or read from disk.

In this scenario...

const document = io.read('path/to/input.unknown');

...

await io.write('path/to/output.unknown', document);

... there is nothing in document to tell the I/O class whether the original source on disk was JSON or binary, so no guarantees can be made that it will be written in the original format without user/application input.

It's that what @hybridherbst means: take notice of the original file header in the io class and - if using write and if the format is unknown then use that information to write as either json or binary?

Understood, but I don't think that I agree with the suggestion. Creates open-ended issues:

what happens when two documents are combined?
what happens when a different I/O instance is used to read than to write?
what about .writeJSON / .readJSON, which supports both .glb/.gltf but doesn't get to see the magic header?
if buffers are added to the file, must they be merged, or do we convert from .glb to .gltf?

I'm OK with making changes to ensure .vrm is handled like .glb, and perhaps allowing registration of other extensions... but I do think that unknown extensions should trigger warnings rather than attempting auto-detection, which I don't think can be supported consistently.

Do I understand right that the complexity here comes from the fact that

glTF-Transform/packages/core/src/utils/file-utils.ts

Line 22 in 3b9edcf

static extension(uri: string): string {

looks at the string path only, and it's not easy to add "look at the first 4 bytes to check what it actually is" since the actual file reads currently happen later / after that decision?

From my perspective the complexity comes from trying to keep a persistent knowledge of what the original source-on-disk of a particular Document might have been. It's 'easy' at an application level, if your pipeline looks exactly like this:

const document = await io.read('in.unknown');

// ... make some edits ...

await io.write('out.unknown', document);

But at the library level, I don't want to make an assumption that the pipeline above is what's happening. Documents can be merged or cloned, applications might use different I/O classes for reading and writing. I think it's too much magic, when the alternative is much clearer and not particularly complex:

import { writeFile } from 'node:fs';

await writeFile('out.unknown', await io.writeBinary(document));

OK, understood! I think I looked in the wrong place, what you meant with "adding custom filetypes" is probably that the ".vrm is to be treated as a glb file" check would have to happen somewhere around here (at the end of the pipeline, not at the beginning).

glTF-Transform/packages/core/src/io/node-io.ts

Line 135 in 3b9edcf

const isGLB = !!uri.match(/\.glb$/);