You may find Kaitai Struct useful for your task
Closed this issue · 12 comments
Hi. We are Kaitai Struct project. We have a declarative DSL for specifying binary grammars and a transpiler fot them (and it is also very convenient for black box reverse engineering, since we have a WebIDE in which it is possible to verify ideas nearly in runtime). So if you defined the specs for the file formats used in Diablo in KS, they would be useful not only for you and people writing in C++, but for other people too, such as @doggan (1) and @sanctuary (4).
also
MPQv1.ksy:
meta:
id: mpqv1
file-extension: mpq
endian: le
seq:
- id: header
type: header
types:
header:
seq:
- id: magic
contents: [0x4d, 0x50, 0x51, 0x1a, 0x20, 0x00, 0x00, 0x00]
- id: archivee_size
type: u4
- id: version
contents: [0x00, 0x00]
- id: sector_size
type: u2
- id: hash_table_pos
type: u4
- id: block_table_pos
type: u4
- id: hash_table_entires
type: u4
- id: block_table_entries
type: u4
instances:
hash_table:
pos: hash_table_pos
repeat: expr
repeat-expr: hash_table_entires
type: hash_entry
block_table:
pos: block_table_pos
repeat: expr
repeat-expr: block_table_entries
type: block_entry
hash_entry:
seq:
- id: name_part1
type: u4
- id: name_part2
type: u4
- id: lang
type: u2
- id: platform
type: u2
- id: block_index
type: u4
block_entry:
seq:
- id: file_pos
type: u4
- id: compressed_size
type: u4
- id: file_size
type: u4
- id: flags
type: u4
(note that the hash and block tables are encrypted to they just look like junk)
Here is the metadata:
meta:
id: blizzard_mpq
title: Blizzard Mike O'brien PaCK
application: Blizzard games
file-extension: mpq
doc: |
File format used by Blizzard games to store resources upon which other formats are built.
doc-ref:
- http://www.zezula.net/en/mpq/main.html
- https://github.com/ladislav-zezula/StormLib
- https://github.com/icza/mpq
- https://github.com/ge0rg/libmpq
- https://github.com/doggan/diablo-file-formats
- https://github.com/sanctuary/notes
- https://github.com/galaxyhaxz/devilution
- https://user.xmission.com/~trevin/DiabloIIv1.09_File_Format.shtml
are encrypted
Will process:
help?
- id: version
contents: [0x00, 0x00]
Maybe
seq:
...
- id: version_minus_1
type: u2
...
instances:
version:
value: version_minus_1 + 1
valid: 1 # for now
- id: flags
type: u4
We have bit-sized types (b1
means 1 bit, bit-endian: le
may be helpful, but please note that it is a very controversal feature that not only places bits to le integers, but also requires reversed declaration order of them in seq
)
contents: [0x4d, 0x50, 0x51, 0x1a, 0x20, 0x00, 0x00, 0x00]
1a
seems to be a variable field composed of 2 b4
.
20 00 00 00
is a header size according to StormLib.
MPQ
can be inserted as a text.
BTW, it'd be much more convenient to discuss it within a PR into https://github.com/kaitai-io/kaitai_struct_formats/
Hey @KOLANICH ! The diablo 1 file formats have been documented for a long time. I don't think devilution really uncovered any more information than we already have. I think the KAITAI struct project is interesting, as it could allow easier access to reversing assets of lesser known projects. Currently I am busy with life and not coding much, but every sight of the blue moon I pitch a commit to https://github.com/diabpsx. Specifically I have been reversing the file formats of Diablo for the PSX, to help us translate the game to Japanese and more! Perhaps I can assist your project with my finding in that.
Devilution is complete. There is a jackpot of code including MPQ (maybe even from Blizzard themselves ;P) out there if you know where to look ;)
Eventually I'd like to replace these formats with common open formats to allow ease of modding the game and less unique tools required.
Cheers and have a great new years!, Andi
contents: [0x4d, 0x50, 0x51, 0x1a, 0x20, 0x00, 0x00, 0x00]
1a
seems to be a variable field composed of 2b4
.
20 00 00 00
is a header size according to StormLib.
MPQ
can be inserted as a text.
I was lazy and only wanted to implement non-user MPQ version 1, so that's why I hardcoded 1a, header size and version, also I don't have any other MPQ variants handy to test with.
I'll have a look at process
, the compression is trivial and doesn't make use of any key so might be doable.
P.s. your third links is for D2 v 1.09, not D1 v 1.09
Here is a definition for CEL files. One thing I do find handy about this tool is that it helps to visualize whether you have fully mapped all parts of the data.
meta:
id: cel
title: CEL (Diablo I graphics) raster image file format
file-extension:
- cel
endian: le
seq:
- id: file_hdr
type: file_header
- id: frames
repeat: expr
repeat-expr: file_hdr.nframes
size: file_hdr.frame_offsets[_index + 1] - file_hdr.frame_offsets[_index]
types:
file_header:
seq:
- id: nframes
type: u4
- id: frame_offsets
type: u4
repeat: expr
repeat-expr: nframes + 1
@KOLANICH one thing I couldn't figure out how to do here was to specify the position as file_hdr.frame_offsets[_index]. This definition still worked with the .cel files I tested as they all had the data packed sequentially, but technically this isn't the correct way to handle it. The WebIDE kept complaining that pos wasn't a keyword and so I gave up 🤷
This definition still worked with the .cel files I tested as they all had the data packed sequentially
I have read some free-form text specs and some source code, all of them assumme that the frames are laid sequentially (because they calculate frame byte size as a difference of offsets).
The WebIDE kept complaining that pos wasn't a keyword and so I gave up 🤷
pos
for instances is nit for individual items, but for whole instance. You may need a type with params
meta:
id: cel
title: CEL (Diablo I graphics) raster image file format
file-extension: cel
endian: le
seq:
- id: file_hdr
type: file_header
- id: frames
type: frame(file_hdr.frame_offsets[_index], file_hdr.frame_offsets[_index + 1] - file_hdr.frame_offsets[_index])
repeat: expr
repeat-expr: file_hdr.nframes
types:
frame:
params:
- id: pos
type: u4
- id: size
type: u4
instances:
frame:
pos: pos
size: size
file_header:
seq:
- id: nframes
type: u4
- id: frame_offsets
type: u4
repeat: expr
repeat-expr: nframes + 1
but this spec makes no sense if the frames always are laid sequentially
Some CEL's are also of type "slab cels" in which they contain a five-frame offset table that breaks a frame into separate parts. This feature appears to be an unused micro optimization. After the frame table, there will be the following data:
0x000A 0x???? 0x???? 0x???? 0x????
You would have to have two separate Kaitai structs or define them by detecting this pattern, which idk if it supports that as a feature.
@galaxyhaxz is this not only for CL2? Do you have an example? The definition @KOLANICH posted should handle them correctly. As far as I can tell the Kaitai structs is flexible enough to deal with variants like this. But it sounds like it needs to be detected by seeing if there is a gap between the end of frame_offsets and the first offset.
The slab table is really helpful as it can be used to calculate the with and total height of the sprite (each slap is 8 pixels in height afaik).
All CL2 are slab whereas only some CEL have them. The cursors have them. Take a look at the official code and you can find many references to DrawSlabCel
.
EDIT: and once graphics are converted to a different format we will have the dimension information without the need for hacks to calculate them =P