Hakuoki script file format(uncompressed) ======================================== General weirdness: ----------------- This is not a typical script file as in most NDS games. This is a fully interpreted bytecode language which I suspect, is the compiled version of an internal scripting language. It contains not only the spoken text and the speakers name but commands for all that is going on. Completely reverse engineering the virtual machine on which these scripts are run would enable us to change virtually anything in the game. From different effects to adding totally new scenes. But reversing a bytecode language isn't all that easy and it's going to take a while till I get somewhere with that ;) Anyway, as interesting as the whole custom bytecode VM thing is. It's a lot of trouble if you want to edit them. You have to take care of the same annoying things as when you edit other executable formats. Global references are used throughout the file, these have to be adjusted if you change the size of instructions or add instructions. Additional to the global references there is an export table (usually at the end of the file, more on that later), which may also need to be adjusted. The header of the STCM2L format as it seems to be called is pretty useless. As far as I can tell there is no table that contains information on the start of the different sections of the script. So you'll have to look for 'CODE_START_' manually. All Strings are 4byte aligned. This is achieved by adding 0x00 bytes to the end of the string until the 4byte boundary is reached. Oh yea, all of this is little endian, obviously. CODE section: ------------- So this is the code body. It's the same thing as the .text section in an PE file for example. The sections starts with the string 'CODE_START_', right after that the code starts. Instructions are in the following format: 32bit: is this instruction a call? 32bit: opcode 32bit: amount of parameters 32bit: length of instruction block (in bytes) Following that are the specified amount of parameter blocks, these blocks look like this: 32bit: p0 32bit: p1 32bit: p2 Sorry for the wierd naming scheme but there isn't much logic in all of this, at least none that I have found. Parameters need some more explaining as they are totally terrible. Normal paramters are usually all numbers with 0xff as their most significant byte of p0(little-endian that is, yes I know that it's weird). Now the really interesting values are pointers, which come in 2 flavours in this case. I call one local pointers and the others global pointers. Global pointers are a bit easier so I'll start with those. These pointers point to some other place in the file. Usually to another instruction. They are easily noticable as their p0 equals 0xffffff41, always. The global pointer is then located in p1. Local pointers... I feel stupid explaining it because finding them is a bit hackish. These are the pointers which point to a location inside the instruction. Usually they point to a string which is passed to the instruction (which is normally the case with text that somebody speaks in a game). I believe that it's usually distinguished only by the VM who knows the types in the parameter list for every instruction. However, we don't know the types so we have to do something a little different. First of all, local pointers are values without 0xff as their most significant byte. Sadly not all values are marked by the 0xff. So we can't assume that every parameter that is not led by a 0xff is a local pointer. To check if a value is a local pointer we check if the value that could be a local pointer points inside the instruction block. This test may ofc fail, due to obvious reasons. But in practice it works fine as values without the 0xff marking tend to be quite low. Note that local pointers only occur in p0 and they are absolute addresses. Because this is bloody complicated and I'm not a great writer here's a snippet of C code showing how to parse parameters for now. for(int i=0; i<paramcount; ++i){ parameter* p = new parameter; fread(&p->val0, 4, 1, f); fread(&p->val1, 4, 1, f); fread(&p->val2, 4, 1, f); if( ((p->val0>>24) & 0xff)!=0xff && p->val0 > old_addr && p->val0 < old_addr+length){ p->type = PTYPE_LOCALP; p->relative_addr = p->val0 - old_addr; }else if(p->val1 == 0xffffff41){ global_val = p->val1; p->type = PTYPE_GLOBALP; }else{ p->type = PTYPE_VALUE; } } * Calls: Remember the first value in the instruction? Yes, it specifies if something is a global call. If this is set to 1 instead of the usual 0 this means that this instruction is in fact only a call to another piece of code. In this case the opcode is the absolute global address to the instruction that will be called. * Strings: Local pointers usually point to larger blocks of data that you want to pass. This most likely will be a string. A string type in STCM2L looks like this: 32 bit: zero (0x00000000) 32 bit: amount of 4 byte blocks used (strlen/4) 32 bit: one (0x00000001) 32 bit: string length followed by the string Export table: ----------- At least this one is quite straightforward. Comes after the code segment and is marked by the string 'EXPORT_DATA' Format for entries: 32 bit: zero 32 byte: null terminated string. Rest of space is padded with zeros 32 bit: address of instruction that is exported