xoreos/xoreos-tools

Write an NWScript assembler and compiler

DrMcCoy opened this issue ยท 12 comments

Currently, we have a disassembler for NWScript bytecode, ncsdis. For an explanation for what NWScript is and how it works internally, please see here: https://xoreos.org/blog/2016/01/12/disassembling-nwscript-bytecode/

However, we could also use an NWScript assembler. For that, the existing NWScript IR structures could be leveraged.

As a first step, a roundtrip from NWScript bytecode -> IR -> NWScript bytecode would be possible. Afterwards, NWScript disassembly as produced by the disassembler could be read, parsed, and stuffed into the IR structures to be turned into bytecode.

The IR itself might also need to be modified, to bring it more in line with usual compiler IR.

The next step after that would be to expand the assembler into a compiler. Read and parse the C-like NWScript source, compile it into IR, and then write it to disk as bytecode.

Owning the different BioWare games would be useful here, since more functionality was added to the script over the years. And testing the assembler/compiler over a wide variety of scripts is paramount. As a starter, though, a few (Neverwinter-Nights-era) script sources and their bytecode are here: scripts.zip

Like the rest of the xoreos-tool, the assembler and compiler should be written in C++. xoreos-tools is currently fully C++03, but I'm opening it up to C++11 now. I don't necessarily want to see a PR with thousands line diffs changing everything in xoreos-tools the C++11 way, but feel free to use C++11 features in new code.

I'm interested. Is there a spec somewhere describing the high level language and the bytecode ops?

There are no official specs, no.

However, Torlack figured out the instructions back in the day. His write-up is mirrored here: https://htmlpreview.github.io/?https://github.com/xoreos/xoreos-docs/blob/master/specs/torlack/ncs.html . It is missing the 4 opcodes that were added with Dragon Age: Origins and Dragon Age II, though, concerning arrays and references.

You can also see the Opcode enum here: https://github.com/xoreos/xoreos-tools/blob/master/src/nwscript/instruction.h#L48 , as well as our script interpreter here: https://github.com/xoreos/xoreos/blob/master/src/aurora/nwscript/ncsfile.cpp#L429 . Skywing also wrote a JIT thing for NWN/NWN2: https://github.com/SkywingvL/nwn2dev-public/blob/master/NWNScriptLib/NWScriptVM.cpp#L1495 , however also without the Dragon Age opcodes. (The xoreos interpreter handles those new opcodes. I figured them out pretty easily by looking at the stack before and what the script did with the stack afterwards.)

As you can see, it's all based on RE and spelunking. No official specs were ever released.

Similarily, there are no specs for the source language. From what I heard, BioWare itself didn't have a formal language description, either, but just made it up as they went along. It's best to just analyze existing scripts and see what they do and how, I guess.

The syntax is very C-like. void main() declares a general script that does an action, int StartingConditional() a script that enables/disables a branch options in a conversation.

The types are are "int", "float", "string", "object" (game entities like doors, NPCs, etc.) and "vector" (three floats, like a position, but internally that's three single floats on the stack). Additionally, there are six game-specific types (engine types), like "location" or "event". These are different from game to game. The script can only throw them into engine functions (syscalls, basically). "action" is an offset into a script, to be used for asynchronous functions (delay a script by x seconds, or put a script into an object's action queue). Later games added a "resource" type, but that's internally just a string. Structs exist, but are internally just the different members on the stack, on after the other. Dragon Age: Origins added arrays, those are dynamic and only take up one stack entry. Dragon Age 2 added reference, they are mostly used for engine types (so that engine functions operating on engine types can modify them without returned a modified copy). The "any" pseudo-type is used to denote that an engine function can type a variable of any type.

Engine functions are basically syscalls, in a way. Every game has a huge array of functions the scripts can call into. Those do the heavy lifting, like creating an NPC, making them walk around, or similar things. In the script, they are only referenced by a number, and the number of parameter it takes. As long as ncsdis knows which game the script belongs to, it can already resolve them to user-readable names. Each games usually comes with a text file (a header, if you will) describing those, often called nwscript.nss, which is where we have that information from.

Despite being C-like, the language isn't actual ANSI/ISO C. For example, no digraphs/trigraphs, and also no escape sequences in string literals. The latter is something that Beamdog Studios recently added to their Enhanced Edition of Neverwinter Nights. Or at least, they added support for \", but then didn't add support for \\, making it impossible to have a \ at the end of a string. Also, several existing strings reportedly don't compile now, because of \ at the end of the string. Ideally, an NWScript compiler in xoreos-tools should have a switch to disable/enable escapes in string literals. But that's details for later. :P

Apart from analyzing the scripts, Neverwinter Nights (the original versions as well as the Enhanded Edition) comes with a toolset on Windows, including a built-in script compiler. That's useful for checking specific cases. Also, there is the offical beta version of the toolset, still available here: https://www.fileplanet.com/88066/80000/fileinfo/Neverwinter-Aurora-Toolset---BETA-Version , which has a script compiler with certain...quirks/bugs. Some scripts the game still runs have been compiled with such a previous version, so these quirks are still relevant.

Also, there's of course the OpenKnights nwnnsscomp. I mirrored the old repository here, with some compilation fixes: https://github.com/DrMcCoy/NWNTools . You can find several dozens of different versions of this compiler flying around, sometimes modified for later games, sometimes binary-only. It's a bit of a mess, unfortunately. AFAIK, none of them supports the array and reference opcodes added by the Dragon Age games, though (even my mirror just adds them to the rudimentary disassembler).

I'm also interested in this and #16. @isovector, if you're going to work on this I'll leave you to it (until it's developed enough that I'm not stepping on your toes), and do some work on the decompiler instead. If you feel like it'd be helpful to split this into smaller sub tasks at some point I'm happy to pick them up.

@AbigailBuccaneer feel free to pick this up! I'd love to contribute (especially if it's written in Haskell :) ), but am not sure I want to own it.

Sorry, I'd really want C++ only. No Haskell.

(I probably should have stated that in the issue text above, I guess?)

Okay, I added a paragraph to the bottom of the issue that it should be written in C++. And I'm opening up xoreos-tools to C++11 now (currently, it's fully C++03). Feel free to use C++11 features in new code.

This will be a bit of a hassle for me for a future release, since I'll have to upgrade all my cross-compilers, but that's for future-DrMcCoy to deal with. From what I can see, it should all be possible.

Any luck finding some time looking at this, @isovector and/or @AbigailBuccaneer? If you have any questions, please feel free to ask. :)

Me, I did actually update all my cross-compilers. Was also necessary for https://github.com/xoreos/phaethon/ , truth be told, if I want to distribute packages for that next release (which I should do relatively soon).

I'd love to help, but I'm out if you insist on C++! It's nothing personal, just that it would take 1/20th of the work in another language, and that it would be more fun to do. Best of luck though!

Yeah, sorry, I do really insist on C++. I'd like this to be build around/with the already existing codebase. My fault for forgetting to mention that from the start; I apologize.

In either case, thanks for looking, and good luck to you as well.

I'm actively working on this [in C++], but I don't have nearly as much time as I'd like. So I'll keep working away at it. Currently the parser is nearly complete (and I keep finding some surprising ways where the grammar parser by the official compiler diverges subtly but surprisingly from C's grammar!)

Neat, thanks for the update, @AbigailBuccaneer. Looking forward to that, then :)

And yes, the original compiler being weird in places doesn't really surprise me. It's BioWare, after all :P

Hej @AbigailBuccaneer, just following up on this again. Any luck? If you need any help or anything, just poke me :)