/uasm

Universal Assembler

Primary LanguageCGNU General Public License v2.0GPL-2.0

Universal Assembler

This is a universal relocatable macro assembler and linker. It does not target any particular processor, instead you define the instruction set grammar using pseudo-instructions. This is a convenient assembler for custom micro-engines you might design for an FPGA or ASIC.

The definition for a particular processor could be contained in an "include" file. For example, this is how you could define the Motorola MC6800 conditional branch instructions:

; Define a syntax rule: "name" will match "pattern".  If there
; is a match, "expr" is returned.
.rule	name	expr	pattern

; Define condition codes, so that "<cc>" will refer to them
; So for example, cs means carry set and will have the value $25

.rule	cc	$20	ra
.rule	cc	$22	hi
.rule	cc	$23	ls
.rule	cc	$24	cc
.rule	cc	$24	hs
.rule	cc	$25	cs
.rule	cc	$25	lo
.rule	cc	$26	ne
.rule	cc	$27	eq
.rule	cc	$28	vc
.rule	cc	$29	vs
.rule	cc	$2a	pl
.rule	cc	$2b	mi
.rule	cc	$2c	ge
.rule	cc	$2d	lt
.rule	cc	$2e	gt
.rule	cc	$2f	le
.rule	cc	$8d	sr

; Define the branch instructions

.insn	b<cc>	<expr>
 .emit	arg1
 .emit	arg2-.+1
.end

; b<cc> defines all branch instructions, including bra, bhi, bls,
; bcc, bcs, etc.

; <expr> is the branch target, an arbitrary expression

; There is whitesace between <cc> and <expr>, so the input is
; allowed to have whitespace there.

; Within the body of the instruction, we emit two bytes:
; First byte is the op-code for the branch instrucion.

; Second byte is relative branch offset

; arg1, arg2, etc.  are replaced with value of the rule or value of
; the expression.  They are assigned in the order that values appear
; in the pattern after the ".insn".

; Note that rules can return a comma separated list of values.  Also
; note that the "pattern" part of the rule can include reference to
; other named rules or expressions:

.rule	mode	$F8,arg1	<expr>	; Direct addressing
.rule	mode	$FA,arg1	#<expr>	; Immediate addressing

; In the above, "arg1" is replaced with the <expr> from the pattern.

Assembler command line

uasm [-l file] [-o file] [-I path]... file

'file'  Name of assembly language source file to assemble.

'-I'    Add path to include file path list.

'-l'    Generate assembly listing.

'-o'    Generate object file.

Linker command line

ulink [-q] [-l file] [-o file] [-loc link[,load]:sect+sect+...]... files

'-q'     Suppress messages

'files'  Names of object files, libraries and text files containing
     additional file names.  First module found in this list is the
     root module of the program.  Each other module found here is only
     included if there is a reference path from the root module to it.

     A library is one or more object modules simply concatenated together.

'-o'     Gives name of binary output file.  If no name is given, no output file
     is generated.  Note that the first byte of the output file is always
     the first generated byte, regardless of load address.

'-l'     Gives name of map/cross-reference listing file.  If no name is given
     no map file is generated

'-loc link[,load]:sect+sect+sect...'
     Locate sections:
       'link' is the starting address of the given sections in hex.  This
              is the address the sections are expected to be at when the
              program is executed and is used to link symbols together.
       'load' is the load address of the sections in hex.  It is the
              address in the binary output file where the sections will
              be placed by the linker.  If this is left out, it will
              default to the same as the link address.
       sect+sect...
              list of sections to be located.  The first section in the
              list is located at the given address.  Additional sections
              are located directly after the first.

     If not all sections are located on the command line, the linker will
     prompt with '>' for the locations of the remaining sections.

     Note that more than one '-loc' may be specified; there should be one
     for each fixed address in the desired memory map.

Assembler Pseudo-instructions

.set	label,expr		; Temporarily set label to value of expression

.equ	label,expr		; Permamently set label to value of expression

label:				; Same as '.equ label,.'

.space	expr			; Reserve space

.emit	expr			; Emit a single byte

.align	expr			; Align to multiple of expr

.sect	"name"			; Switch to named section

.public	label, label, ....	; Make labels public

.macro	name,arg,arg,...	; Define normal macro
.end

.foreach name,arg		; Define foreach macro: body is
				; expanded for every character
.end				; of the argument.

name	arg,arg,arg,...		; Expand macro

.errif	expr,string		; Print string if expr is true

.if	expr			; Conditional assembly
.elseif	expr
.else
.end

.include "filename"		; Include a file

.rule	name expr pattern	; Define a syntax rule

.insn	pattern			; Define an instruction
.end

In a syntax rule:

; comments are allowed (they are ignored).

<...> refers to another rule.

whitespace means require whitespace here.

<> means whitespace is optional here.

<expr> means expect an expression here, possibly by whitespace (so
it is not necessary to surround <expr> with <>s).

other characters are literal matches.

expressions

label				; Returns its 32-bit value
.				; Current location value

$FF80				; Hex constant
@770				; Octal constant
%1011				; Binary constant
123				; Decimal constant

( expr )

- expr
~ expr
! expr

expr << expr
expr >> expr

expr * expr
expr / expr
expr % expr
expr & expr

expr + expr
expr - expr
expr | expr
expr ^ expr

expr == expr
expr != expr
expr < expr
expr > expr
expr >= expr
expr <= expr

expr && expr

expr || expr

expr ? expr : expr

Object module format

In the object module format description we use the following notation for object module componants:

<type:name>

where: 'name' is replaced with a descriptive name of the componant

       'type' indicates how the componant is stored in an object file

Object module componant types:

'byte'		componant is a single byte

'num'		a variable length unsigned number:

		If the number is between 0 and 125 inclusive,
		128 is added to the number and it is emitted as a
		single byte.

		If the number is between 126 and 32767 inclusive,
		the number is emitted as two bytes.  The most
		significant byte is emitted first.

		If the number is between 32768 and 2^32-1 inclusive,
		a byte equalling 255 is emitted, and then the four
		byte number is emitted, most significant byte first.

		Note that a flag value of 254 is reserved for future
		expansion.

'zstring'	a variable length string.  the string is emitted
		as-is, and includes a terminating NUL.

'string'	a variable length string with size prefix.  These,
		strings have the following format:

			<num:string-length> <zstring:the string>

		The string-length includes the terminating NUL of
		the the zstring.

'expr'		is an expression emitted in reverse-polish notation. 
		See interm.c and interm.h for how expressions are
		emitted.

An object module is composed of records. The general format of a record is as follows:

<byte:type-code> <num:body-size> <bytes:body>

where: is a single byte record type code. Record type codes are defined in interm.h

<body-size> gives the size of just the body in bytes.

<body>      depends on record type and may have zero length.

Module name record. Always first record in module.

iMODULE <num:bodysize> <zstring:module-name>

Section list.

iSECTS <num:bodysize> <num:no.sections> { <num:align> <num:size> <string:section-name> } ...

Symbols. The first <no.pubs> symbols are publics. The remaining symbols are external references.

iSYMS <num:bodysize> <num:no.symbols> <num:no.pubs> { <string:symbol-name> <string:source-reference> } ...

Public symbol values. No. values is same as <no.pubs> in iSYMS record, and in same order.

iXDEFS <num:bodysize> { <expr:value> } ...

Data fragment to be placed at given offset of given section.

iFRAG <num:bodysize> <num:section no.> <num:offset> <bytes:data>

Fixups for immediately previously emitted data fragment.

iFIXUPS <num:bodysize> <num:num-fixups> { <num:data-offset> <num:type> <expr:value> <expr:msg> } ...

A type code of 1 indicates that this is a byte fixup and the value of the byte is determined by the value expression.

A type code of 2 indicates that this is a range check instruction and the the messages expression is printed if the value expression evaluated to a non-zero (true) value.

End of module.

iEND <num:bodysize>