/Dark-Tower

Reverse-assembly of Milton Bradley's Dark Tower game

Primary LanguageAssembly

Milton Bradley's Dark Tower disassembly and documentation

Background

As a teenager in the '80s I had a friend that had Milton Bradley's Dark Tower game. We loved playing it and I recently picked up a copy off of eBay. While playing recently, I started really wondering about its internal workings, and after doing a bit of research, I stumbled across Sean Riddle's TMS1400 page.

Seeing the work he did there and the ROM dumps that were available, I immediately set out in search of a disassembler for the TMS1400. After considerable digging, I discovered @paulscottrobson's Simon2Simon project where he had written a TMS1000 disassembler and assembler. Using that (yeah, I should have done some git magic and created a branch, but this is the first I've ever used github), I took it and modified both it and his assembler heavily to support the TMS1400 and some of the things I wanted to accomplish. This did require taking out some features he had put in and I'll apologize in advance for my lack of python skill. In spite of my ineptitude at python, I was able to take his code and build a disassembler and its sister assembler to generate a binary that matched the original input. I can't emphasize enough that he did the vast majority of the heavy lifting.

The real work

Once I had that, it was just a matter of starting to dig into the assembly code along with the TMS1400 documentation, the various bits of information on Sean's page and the Dark Tower instructions themselves. This took me reasonably far until I started trying to figure out the physical interactions. It was then that I discovered that MAME had a built-in debugger and already supported the Dark Tower game. Visibility into the internal workings of the processor and having a single-step debugger basically made the entire process easy.

Pieces and parts

The various (imporant) files that make up this repository are:

  • darktower.asm - The file that was originally generated by the disassembler (bin/dasm.py) ; this has been heavily commented by me
  • darktower.lst - The listing file generated by the assembler (bin/tasm.py)
  • darktower.sym - Symbol file generated by the assembler and taken as an input
  • darktower.bin - Binary dump from Sean Riddle's site
  • Dark Tower Info.txt - Various notes I've made while digging through the code
  • Dark Tower RAM usage.xlsx - Excel spreadsheet representing the TMS1400's RAM and how Dark Tower uses it
  • Dark Tower Documentation/Dark Tower controller Pinout.txt - Sean's original file with some additional information I've discovered
  • bin/dasm.py - My version of @paulscottrobson's TMS1100 disassembler heavily modified to disassemble TMS1400 binary data
  • bin/tasm.py - My version of @paulscottrobson's TMS1100 assembler heavily modified to assemble TMS1400 code (specifically, dasm's output)

A couple of explanations

My representation of RAM addressing

The TMS1400 has eight files of 16 4-bit words that are addressed by instructions using the X and Y registers as pointers. While MAME simply treats the RAM as sequentially-addressed bytes of data, this didn't seem appropriate to me. In all of my comments as well as the Excel spreadsheet, I refer to RAM addresses in the format of X/Y, where X is the file identifier and Y is the word in that file.

For example, in the AD2B10 routine, I have the following comment:

;********************************************************************************
; AD2B10
;
; Perform Base 10 arithmetic on values in scratchpad RAM 4/1-2 and 5/1-2
;
; Add the two-digit base 10 number encoded in 4/1 and 4/2 to the two-digit 
; base 10 number encoded in 5/1 and 5/2 giving a three-digit base-10 number
; encoded in 5/0-2.
;
; For example:   0 1 2     (x means doesn't matter--overwritten)
;              4 x 1 9
;              5 x 2 2
;
; Returns:       0 1 2
;              4 0 1 9
;              5 0 4 1
;********************************************************************************

The reference to RAM 4/1-2 refers to the words referenced when the X register is loaded with a 4 and the Y register is loaded with a 1 and subsequently a 2. The same applies in the comment about 5/1--where X=5 and Y=1. When viewing RAM in MAME, this would refer to address 0x51.

In the examples, the numbers in the first row and first column represent the Y and X values respectively forming a grid. I suppose representing it this way would make more sense:

	    0 1 2
	   -------
	4 | x 1 9
	5 | x 2 2

The assembler file format

I didn't put a ton of effort into the assembler itself. It basically assumes any characters in the first eight positions are labels. In the initial pass of the disassembler, labels are generated with either an "L" or an "S" as their first character (long or short respectively) concatenated with the address they represent. Note the L/S usage is determined by the first time the label is generated. It's possible that a long branch used once could be targeted by short branches later. No change to the label would be made.

As I was writing this document, I stumbled across the original TI assembler file format on page 4-4 of the Programmer Reference Manual. I wish I would have seen this as I would have followed their formatting standard; however, at this point it would be difficult to run the code back through the disassembler and reintegrate the comments. Hindsight's 20/20.

The listing file

Code

Here are the first few lines from the lst file in my initial commit:

                 1 	
                 2 ; *** Chapter 0 page 0	
                 3 	
0:0:00 000:28    4 S000    ldx   0         	
0:0:01 001:21    5         tma             	
0:0:03 003:71    6         a9aac           	
0:0:07 007:BE    7         br    S03E      	
0:0:0F 00F:61    8 S00F    tcmiy 8         	
0:0:1F 01F:40    9         tcy   0         	
0:0:3F 03F:9B   10         br    S01B      	
0:0:3E 03E:60   11 S03E    tcmiy 0         	

The first three colon-delimited values are there to correspond to MAME's representation of addresses in its debugger. Basically, they are in the format Chapter:Page:Offset. Note that the TMS1xxx processers do not use a sequential address scheme; this is why addresses run in the goofy order they do.

The second two colon-delimited values are simply the absolute address followed by the opcode. Following that are the line number and the original assembly source.

Cross-reference

Following the listing is the cross-reference section:

Cross-reference
---------------

LABEL    VALUE	- (DEF) REF(b = BR, c=CALL)
S000     0x0000	- (4) 33(b) 41(b)
S00A     0x000a	- (67) 61(b) 63(b)
S00F     0x000f	- (8) 45(b)
S011     0x0011	- (57) 51(b)
S018     0x0018	- (34) 32(b)
S01B     0x001b	- (48) 10(b)
S01C     0x001c	- (42) 35(b) 40(b)
S026     0x0026	- (62) 56(b)
S033     0x0033	- (25) 22(b)
S03D     0x003d	- (17) 66(b) 72(b) 806(b)
S03E     0x003e	- (11) 7(b)
L040     0x0040	- (77) 26(c) 1369(c)

As the headings indicate, the first column is the name of the label, the second column is its value (absolute address). The subsequent values after the hyphen provide information about the label. The first value in parenthesis is the line number where the label is defined. The following values are lines where the label is referenced followed by either a "b" if it was used in a branch (br) statement or a "c" if it was used in a call (call).

A disassembler caveat

Overview

One of the major challenges with disassembling TMS1xxx code is due to its use of the chapter buffer and page buffer registers on branches and calls. Judging from the way the Dark Tower code was written and some of the examples I saw in the documentation, the TI TMS1xxx assembler supported both short and long branches/calls. In the case of a long branch/call, the appropriate ldp or ldp/tpc/ldp combo was inserted into the code and that information was lost. From a disassembly perspective, an assumption has to be made as to the contents of the page buffer and chapter buffer registers on a given br or call to decide whether it's long or short.

Now, the assumption really becomes an educated guess based on evidence in the source code. Generally speaking, a long br or call (the TI assembler used bl according to page 2-6 and calll according to the sample code on page 14-8 of the TMS1000 Programming Reference Manual) will be immediately preceded by the appropriate page/chapter buffer register loads thanks to the assembler processing the bl/calll. Additionally, upon returning from a call, the page and chapter buffer registers are reset to the current page. Unfortunately, that's not the case with a br. Because the page and chapter buffer registers directly impact the target absolute address, it's necessary to manage this situation in the disassembler so I basically worked under the assumption that the disassembler's internal page and chapter buffer registers should be reset after any call or br. This worked in 99% of the cases I ran into.

Arguably, the bl/calll could have been integrated back into the sources when their use was implied; this functionality was even already in Paul's disassembler. At the time, though, I wanted the generated disassembly to correspond line-for-line with the object code. To this end, I stripped that code out of the disassembler to ensure I had good output.

The edge cases

Because ROM space is at a premium, I discovered that the above rule was broken in a few instances at the end of some pages. My guess is, the programmer(s) needed the extra bytes so they went in and massaged the code to recover the extra bytes that would be unnecessarily consumed by extra ldp instructions. Here's an example from the listing at the end of Chapter 0, page 6:

0:6:22 1A2:4C  595 S1A2    tcy   3         ; Switch drum to Gold Key/Silver Key/Brass Key	
0:6:04 184:11  596         ldp   8         	
0:6:09 189:D5  597         call  ROTDRUM   	
0:6:13 193:19  598         ldp   9         	
0:6:26 1A6:CF  599         call  LTMIDCL   ; Light Silver Key and clear display	
               600 	
0:6:0C 18C:29  601 S18C    ldx   4         	
0:6:19 199:47  602         tcy   14        	
0:6:32 1B2:1E  603         ldp   7         	
0:6:25 1A5:39  604         tbit1 2         ; Brass Key found during encounter?	
0:6:0A 18A:9F  605         br    L1DF      	
               606 ;	
               607 ; Display Brass Key	
               608 ;	
0:6:15 195:2A  609         ldx   2         	
0:6:2A 1AA:4C  610         tcy   3         	
0:6:14 194:3A  611         tbit1 1         ; Check inventory for Brass Key	
0:6:28 1A8:80  612         br    L1C0      	
0:6:10 190:9F  613         br    L1DF      	
0:6:20 1A0:00  614         mnea            	

Note the calls on lines 597 and 599 are both preceded by ldp instructions as expected. Where I did, however, run into issues was the br statements at lines 612 and 613. Note there's an ldp 7 on line 603. The disassembler assumed that it was paired with line 605 and reset the page buffer register making the br statements at lines 612 and 613 short branches which was incorrect. This was only revealed as I stepped through the program, working to document it. Making matters worse, this does not impact the machine code put out by the assembler as a branch is always assembled the same way; it's the contents of the page and chapter buffer registers that controls whether it's a long jump or not.

Here's what I think happened. This is likely the code that was generated by the assembler when the long branch pseudo-op (bl) was used:

S18C    ldx   4         
	tcy   14        
	tbit1 2         ; Brass Key found during encounter?
	ldp   7         
	br    L1DF      
;
; Display Brass Key
;
	ldx   2         
	tcy   3         
	tbit1 1         ; Check inventory for Brass Key
	ldp   7         
	br    L1C0      
	ldp   7         
	br    L1DF      

Note there were two extra ldp 7 instructions, wasting two extra words of ROM space. It appears the programmer went back and patched the code slightly to take out the extra, unnecessary instructions. This may also explain the spurious mnea at the end of the page. As soon as I discovered this situation, I went back and reviewed the code at the end of each page and discovered about a half-dozen more instances that have been manually repaired.

Possible future work

Both the disassembler and assembler are purpose-built for use in this project. If there is interest, the following are improvements I'm thinking about:

  • Support parameters for input and output files on both the assembler and disassembler
  • Support bl and calll pseudo-ops in the assembler
  • Support generating bl and calll pseudo-ops in the disassembler
  • Change the formatting of the output of the disassembler and the input of the assembler to match the Programmer Guide more closely
  • Update both programs to not only support the TMS1100/TMS1400 instruction set but also the TMS1000/TMS1200 as well as was originally the case