xoreos/xoreos-tools

FEATURE: "File picker" for RIM/ERF/MOD files

lachjames opened this issue · 9 comments

Hi :)

As mentioned in #59, it would be useful if xoreos-tools had a straightforward RIM/ERF/MOD packer/unpacker that had the following features (given a particular archive):

  • Add/replace/remove a file in the archive based on binary/xml input (only binary output is sufficient if #59 is implemented as then piping can be used, though an xml output would probably be straightforward and useful either way)
  • Get a file from the archive in binary/xml format (with same caveat as above)
  • Unpack the archive into files, and vice versa

Most of these features are already available in different xoreos-tools executables; the difficult part would probably be the first part (add/replace/remove a file). I imagine the command-line might look like:

command_name {--get/--add/--replace/--remove/--unpack/--pack} {--erf/--rim/...} {--kotor, --kotor2, ...} {--inplace/--newfile} --input infile --output outfile

If infile isn't given, read from stdin; if outfile isn't given, write to stdout (so -i and -o flags are necessary as it can't rely on positioning alone).

I feel pretty bad that I keep asking for new features but don't contribute to the project, so I'm wondering whether I might take this opportunity (if you're open to it) to contribute this feature myself (once #59 is dealt with, as the binary reading will rely on that).

Modifying an existing archive is not really easy task.

From a file operations perspective, unless the file size (of the modified file inside the archive) stays the same, you can't just write a part of the file, you need to essentially write the whole file again. Which means you need to either read the whole thing into memory first, or write into a new file and then do an atomic move.

From a xoreos code perspective, we have no concept at all of modifying archives. We can read archives and we can write (some) archives, and those are two completely different paths. (And especially if you're talking MOD files, we don't yet do all ERF features eithers).

From a tools perspective, this should probably be a feature of the erf (and rim) command, not a separate command to modify archives. We might also want to make the erf/rim commands behave a bit like the usual rar/zip/7z comnands from an invocation stand-point, like unerf mimics unrar/unzip a bit.

I'm not sure what you mean "get a file from the archive in XML format". If you're talking getting an XML representation of a GFF, that's not something that an archive command should concern itself.

Sure I agree that modifying files in-place isn't feasible, so what I really mean is that it creates a new file in-place, i.e. not requiring a -o flag to be set (which is useful for batch editing if, say, one wanted to edit a script that's in multiple archives).

To me, it makes more sense to have a single "archive tool" that has --erf, --rim, --mod, ... flags, but you seem to disagree or be going for a different model and it's not super important.

As you say, the main things that would need to be implemented would be adding/removing files from an archive once it's loaded into memory - replacing could then be done by a combination of remove and add, and reading/writing is already implemented.

If there was no support for reading binary from stdin, allowing the output to be xml rather than binary would be useful (as otherwise converting to xml would require writing and reading a tmp file, which seems a shame to have to do when using two tools from the same library). But with support for reading binary from stdin (and therefore piping) this is a non-issue, and in that case I agree with you 100%.

what I really mean is that it creates a new file in-place, i.e. not requiring a -o flag to be set

I have no idea what you mean by that. Can you give me an example usage?

going for a different model

Yeah, I'm more going in with a Unix "do one thing and do it well" type of model, which is why I also don't think an archive tool should concern itself being able to convert GFF to XML. Though whether that "one thing" is manipulating archives or manipulating one archive format is debateable.

Oh, also:

As you say, the main things that would need to be implemented would be adding/removing files from an archive once it's loaded into memory

The thing is, we don't load an archive completely into memory at the moment. We load the resource index, but the individual files inside the archive are read from the file as needed.

Re the in-place operation, as an example, say I have a script which is included in three .rim files for the game I'm working on, and I want to update that script in all three modules at once. I could run "unrim--replace --gff file.gff --name "script.ncs" --inplace -i *" and it would automatically replace the script in all modules that have it (rather than having to use a loop to set -i and -o to the same filename). This isn't a big deal by any means; just an idea.

I imagine that it wouldn't be too difficult to modify the current code so it can load an entire archive into memory at once, rather than picking individual files out (I'd be willing to have a go implementing this if you want).

unrim --replace --gff file.gff --name "script.ncs" --inplace -i *

What would the --gff parameter do there?

Also, "-i *" doesn't really work like that. On Unixoid systems, the shell expands the glob, so this would expand into

unrim --replace --gff file.gff --name "script.ncs" --inplace -i foo.rim bar.rim foobar.rim

or potentially even

unrim --replace --gff file.gff --name "script.ncs" --inplace -i foo.rim bar.rim foobar.rim foo.wav 1.erf

I.e. the file names don't logically "bind" to the -i parameter. But that's just a minor thing, need to think about how to best map that use case onto parameters.

But that is an in-place replace, I don't understand what you mean with "in-place but not in-place".

I imagine that it wouldn't be too difficult to modify the current code so it can load an entire archive into memory at once

Well, you can just read the whole input stream into another MemoryReadStream that's held in memory, yeah. But doing that always isn't all that great. We need to think about how exactly we're going there.

Doing it with an atomic move is probably better anyway. That way, you're not destroying the input while when something goes wrong, because you're holding out the swapping until everything is done.

The --gff parameter would indicate the location of the GFF file we want to insert into the archive. If both the -gff and -i flags are not given (meaning that both the GFF file and the archive are meant to be read from stdin)... I'm not sure how it would be best to handle that. It would be easiest to just require that one of these two inputs be a file, but this would clash with #59.

Perhaps in-place isn't the best word for it - I just mean that whatever the input filename is, that's also the output filename (so we're not creating a new file, just overwriting an existing one).

As you suggest, it would be best to write the stream to memory and then atomically write it to disk, rather than editing the existing file as it goes along and potentially breaking it if an error is reached. This could still be done in chunks, but realistically the files for Aurora-based games are small enough that I think loading the whole file into memory shouldn't be an issue (keeping in mind that the original games targeted e.g. the original Xbox, with 64MB of RAM).

Err, no, you can't atomically write a file from memory to disk. You write into a new file, flush the caches and then atomically move to new file over the old file, replacing it.

And the archives are big-ish, the games also read only the files within as they are needed. And especially in the later ganes, like The Witcher and the Dragon Ages, the archives are close to the 2GB limit.

Oh yeah of course, I was going to suggest that you write to a tmp file and change the name, but figured maybe you'd know a more elegant way to do it.

Interesting that the archives can get that big (my experience is mainly with KOTOR/KOTOR2 so I'm not so familiar with the later games). In any case, I don't know that there's really an alternative than to just read the whole file, edit it, and then write the result.