galaxyhaxz/devilution

You may find Kaitai Struct useful for your task

Closed this issue · 12 comments

Hi. We are Kaitai Struct project. We have a declarative DSL for specifying binary grammars and a transpiler fot them (and it is also very convenient for black box reverse engineering, since we have a WebIDE in which it is possible to verify ideas nearly in runtime). So if you defined the specs for the file formats used in Diablo in KS, they would be useful not only for you and people writing in C++, but for other people too, such as @doggan (1) and @sanctuary (4).

also

3

MPQv1.ksy:

meta:
  id: mpqv1
  file-extension: mpq
  endian: le
seq:
  - id: header
    type: header
types:
  header:
    seq:
      - id: magic
        contents: [0x4d, 0x50, 0x51, 0x1a, 0x20, 0x00, 0x00, 0x00]
      - id: archivee_size
        type: u4
      - id: version
        contents: [0x00, 0x00]
      - id: sector_size
        type: u2
      - id: hash_table_pos
        type: u4
      - id: block_table_pos
        type: u4
      - id: hash_table_entires
        type: u4
      - id: block_table_entries
        type: u4
    instances:
      hash_table:
        pos: hash_table_pos
        repeat: expr
        repeat-expr: hash_table_entires
        type: hash_entry
      block_table:
        pos: block_table_pos
        repeat: expr
        repeat-expr: block_table_entries
        type: block_entry
  hash_entry:
    seq:
      - id: name_part1
        type: u4
      - id: name_part2
        type: u4
      - id: lang
        type: u2
      - id: platform
        type: u2
      - id: block_index
        type: u4
  block_entry:
    seq:
      - id: file_pos
        type: u4
      - id: compressed_size
        type: u4
      - id: file_size
        type: u4
      - id: flags
        type: u4

(note that the hash and block tables are encrypted to they just look like junk)

Here is the metadata:

meta:
  id: blizzard_mpq
  title: Blizzard Mike O'brien PaCK
  application: Blizzard games
  file-extension: mpq
doc: |
  File format used by Blizzard games to store resources upon which other formats are built.
doc-ref:
  - http://www.zezula.net/en/mpq/main.html
  - https://github.com/ladislav-zezula/StormLib
  - https://github.com/icza/mpq
  - https://github.com/ge0rg/libmpq
  - https://github.com/doggan/diablo-file-formats
  - https://github.com/sanctuary/notes
  - https://github.com/galaxyhaxz/devilution
  - https://user.xmission.com/~trevin/DiabloIIv1.09_File_Format.shtml

are encrypted

Will process: help?

- id: version
  contents: [0x00, 0x00]

Maybe

seq:
  ...
  - id: version_minus_1
    type: u2
...
instances:
  version:
    value: version_minus_1 + 1
    valid: 1 # for now
- id: flags
  type: u4

We have bit-sized types (b1 means 1 bit, bit-endian: le may be helpful, but please note that it is a very controversal feature that not only places bits to le integers, but also requires reversed declaration order of them in seq )

contents: [0x4d, 0x50, 0x51, 0x1a, 0x20, 0x00, 0x00, 0x00]

1a seems to be a variable field composed of 2 b4.

20 00 00 00 is a header size according to StormLib.

MPQ can be inserted as a text.

BTW, it'd be much more convenient to discuss it within a PR into https://github.com/kaitai-io/kaitai_struct_formats/

Hey @KOLANICH ! The diablo 1 file formats have been documented for a long time. I don't think devilution really uncovered any more information than we already have. I think the KAITAI struct project is interesting, as it could allow easier access to reversing assets of lesser known projects. Currently I am busy with life and not coding much, but every sight of the blue moon I pitch a commit to https://github.com/diabpsx. Specifically I have been reversing the file formats of Diablo for the PSX, to help us translate the game to Japanese and more! Perhaps I can assist your project with my finding in that.

Devilution is complete. There is a jackpot of code including MPQ (maybe even from Blizzard themselves ;P) out there if you know where to look ;)

Eventually I'd like to replace these formats with common open formats to allow ease of modding the game and less unique tools required.

Cheers and have a great new years!, Andi

contents: [0x4d, 0x50, 0x51, 0x1a, 0x20, 0x00, 0x00, 0x00]

1a seems to be a variable field composed of 2 b4.

20 00 00 00 is a header size according to StormLib.

MPQ can be inserted as a text.

I was lazy and only wanted to implement non-user MPQ version 1, so that's why I hardcoded 1a, header size and version, also I don't have any other MPQ variants handy to test with.

I'll have a look at process, the compression is trivial and doesn't make use of any key so might be doable.

P.s. your third links is for D2 v 1.09, not D1 v 1.09

Here is a definition for CEL files. One thing I do find handy about this tool is that it helps to visualize whether you have fully mapped all parts of the data.

meta:
  id: cel
  title: CEL (Diablo I graphics) raster image file format
  file-extension:
    - cel
  endian: le
seq:
  - id: file_hdr
    type: file_header
  - id: frames
    repeat: expr
    repeat-expr: file_hdr.nframes
    size: file_hdr.frame_offsets[_index + 1] - file_hdr.frame_offsets[_index]
types:
  file_header:
    seq:
      - id: nframes
        type: u4
      - id: frame_offsets
        type: u4
        repeat: expr
        repeat-expr: nframes + 1

@KOLANICH one thing I couldn't figure out how to do here was to specify the position as file_hdr.frame_offsets[_index]. This definition still worked with the .cel files I tested as they all had the data packed sequentially, but technically this isn't the correct way to handle it. The WebIDE kept complaining that pos wasn't a keyword and so I gave up 🤷

This definition still worked with the .cel files I tested as they all had the data packed sequentially

I have read some free-form text specs and some source code, all of them assumme that the frames are laid sequentially (because they calculate frame byte size as a difference of offsets).

The WebIDE kept complaining that pos wasn't a keyword and so I gave up 🤷

pos for instances is nit for individual items, but for whole instance. You may need a type with params

meta:
  id: cel
  title: CEL (Diablo I graphics) raster image file format
  file-extension: cel
  endian: le

seq:
  - id: file_hdr
    type: file_header
  - id: frames
    type: frame(file_hdr.frame_offsets[_index], file_hdr.frame_offsets[_index + 1] - file_hdr.frame_offsets[_index])
    repeat: expr
    repeat-expr: file_hdr.nframes

types:
  frame:
    params:
      - id: pos
        type: u4
      - id: size
        type: u4
    instances:
      frame:
        pos: pos
        size: size
  file_header:
    seq:
      - id: nframes
        type: u4
      - id: frame_offsets
        type: u4
        repeat: expr
        repeat-expr: nframes + 1

but this spec makes no sense if the frames always are laid sequentially

Some CEL's are also of type "slab cels" in which they contain a five-frame offset table that breaks a frame into separate parts. This feature appears to be an unused micro optimization. After the frame table, there will be the following data:

0x000A 0x???? 0x???? 0x???? 0x????

You would have to have two separate Kaitai structs or define them by detecting this pattern, which idk if it supports that as a feature.

@galaxyhaxz is this not only for CL2? Do you have an example? The definition @KOLANICH posted should handle them correctly. As far as I can tell the Kaitai structs is flexible enough to deal with variants like this. But it sounds like it needs to be detected by seeing if there is a gap between the end of frame_offsets and the first offset.

The slab table is really helpful as it can be used to calculate the with and total height of the sprite (each slap is 8 pixels in height afaik).

All CL2 are slab whereas only some CEL have them. The cursors have them. Take a look at the official code and you can find many references to DrawSlabCel.

EDIT: and once graphics are converted to a different format we will have the dimension information without the need for hacks to calculate them =P