/p32

P32, 32 bit pascal compiler by Johan Prins

Primary LanguagePascalGNU General Public License v2.0GPL-2.0

                                                   P32, 32 bit pascal compiler

                                                                by Johan Prins

                                                            http://surf.to/p32

1. Introduction

1.1. Requirements
To use to compiler you need to following equipment:
. PC with at least a 386 processor
. Some free harddisk space

The compiler produces files containing assembly code. This code can be 
processed by an assembler. Good examples are TASM or NASM. See section 1.4 for 
more information on the tools. The next step is 'linking' the generated 
object files. You need a linker for this, several linkers are described in 
section 1.4.

Short summary of the process:

pascal file -> P32 -> assembly file -> ASM -> object file -> LINK -> executable
   
Because there are a lot of good assemblers and linkers I didn't bother about 
writing my own ones. Maybe I will add them in the very far future.

1.1. History
P32 is a free 32 bit Pascal compiler. The history of P32 started in 1996 when 
I downloaded PowerPascal. This little 'compiler', which produced code for 
OS/2, was the first compiler that had understandable source code. The first 
version of P32 was a simple conversion to the DOS protected mode (DOS32) 
platform. I realized the potential of the compiler and started adding a lot of 
new functions, like: support for procedures/functions, more data types, 
support for floating point numbers, etc. When I wanted to add more advanced 
functions like strong type checking, support for units, type conversios I 
discovered that the structure of the compiler wasn't strong enough. To come 
around this I completely rewrote the compiler so that it used symboltables and 
a parse tree. All identifiers were stored in symboltables and the parse tree 
stored the parsed pascal program. From the parse tree the compiler generated 
the assembly code. Up to now this structure is still sufficient and will 
remain the same for a long time, I hope.

1.2. Freeware
P32 is being released as freeware/sourceware. This means that you don't need 
to pay anything for the program, besides that also all the sources to create 
the executables are included. Some people

2. User manual

2.1.

2.2.

2.3. Procedures and functions
One of the main aspects of the Pascal programming language is the use of 
procedures and functions. These can greatly enhance the structure of your 
program. P32 adds several powerful features to procedures which make them 
more versatile. This is accomplished by adding some, optional,  keywords. 
These keywords will be described later. First the syntax of the procedure 
(and function) is described:

procedure <name> ( <parameters> ); <keyword> ; <string constant> ;

<block statement>

function <name> ( <parameters> ): <type> ; <keyword> ; <string constant> ;

<block statement>

As you might now everything behind the <name> is optional and can be combined
in different ways. 

2.3.1. String constant
The <string constant> _needs_ to be the last element on the declaration line. 
The string constant can contain the actual name that P32 uses for the 
procedure/function when creating the assembly file. This can become very handy 
when linking with external object files.

e.g:     procedure modplay(var s: string); external; 'mxmplay';

         This will create a procedure called 'modplay' but that procedure 
         is actually a function in an third party library called 'mxmplay'.

2.3.2. Keywords.

far, near, interrupt
Added for compatibility with other compilers. The're ignored by P32, for 
clearity a warning message is outputted.

assembler
This forces you to use an asm-statement (asm..end) for the whole procedure or
function. This helps you to write fast procedures in assembly language. The
function result is returned on various places depending on the data type.

. Integers, pointers, booleans and enumerated types return the result in the
  CPU register: EAX.
. Floating point numbers need to be returned on the first FPU stack position, 
  called: ST(0).
. Other structures, such as string are not fully supported yet, so don't use 
  the assembler keyword for these.

register

inline
external
win32api

2.4. Inline assembler

The inline assembler can be invoked by using the ASM..END keywords. All 
statements between these two keywords are seen as assembler statements. The
assembler parser isn't very sophisticated. This means that a lot of syntax
errors aren't catched by the parser, the external assembler will than
complain about them.
There are a few rules you should keep in mind when using the inline assembler. 

When accessing variables always prefix the variable with one of the strings
in the table, also you're forced to use brackets ('[',']') around the
variables.

Examples:

   mov eax, dword ptr [myvar]    {correct}
   mov eax, myvar                {incorrect}
   mov eax, dword ptr fs:[myvar] {correct}
   mov eax, dword ptr fs:eax     {incorrect}
   mov eax, dword ptr fs:[eax]   {correct}

| Prefix    | Size of variable |
|-----------|------------------|
| BYTE PTR  |       1          |
| WORD PTR  |       2          |
| DWORD PTR |       4          |
| QWORD PTR |       8          |
| TBYTE PTR |       10         |

It's possible to let the compiler calculate certain values. This means that
you can include a formula (_only_ containing constants of course) which will
be optimized to a normal integer number. You can use numbers in decimal and
hexadecimal format.

Example:

   mov eax, (256*100)+080h

Known incompatibilities
- @RESULT isn't recognized




3. P32 Internals

3.1. Structure of P32

This chapter will explain the structure of P32. The structure of P32 is not 
derived from any book or whatever. Most of the theories and code are designed
by myself or inspired by other compilers.

The compiler consist of several parts that can be characterized by the 
following names:

Program-parts:

scanner
parser
optimizer
code generator

Data-structures:

symbol-table
constant-table
parse-tree

The heart of the compiler can be found in the combination of the scanner, 
parser, code-generator, symboltable and parse-tree. The following paragraphs 
will describe the functions of these parts.

3.2. Scanner

3.2.1. General
The scanner, some may call it a lexical analyser, takes care of the reading of
the source. The source is a normal textfile containing regular Pascal code. 
The scanner will recognize keywords, identifiers, operators and numbers. The 
operators (e.g. +, -, /, *) and keywords (e.g. BEGIN, END) are converted to 
so-called tokens. These tokens are easier (and faster) to handle for the 
compiler as strings. The identifiers are returned as strings and numbers are 
returned as either integers or floating points, depending on their type. The 
scanner is called by the parser. P32 uses a parser-driven scanner techniques. 
This means that when the parser needs more information the scanner is called. 
The scanner returns the requested information and the parser will process it.

3.2.2. Tokens
The compiler use token to simplify the source code analyzing. The following 
structure is a simplified structure of the one used in P32. It's printed here 
as an example.

Example token structure:

   Token      = 
   (_unknown, _string_constant, _integer_constant,
                _char_constant,    _real_constant,
       _name,         _program,             _var,            
      _const,           _type_,           _begin,            
      _while,          _repeat,     	  _until,
     _lparen,          _rparen,       _separator,
     _assign,           _equal,         _greater,
       _less,         _less_eq,      _greater_eq,
     _not_eq,           _colon);

The token strings are stored in typed constant array. The actual reading and 
analysing is done by the GetToken procedure. This procedure looks like this:

procedure GetToken;

begin
  case Look of {skip comments}
  '{'  :    begin
              getchar;
              if look='$' then DoDirectives;
              repeat
                getchar;
              until Look = '}';
              getchar;
            end;
  '0'..'9': begin {number, integer/floating point}
              while Look in ['0'..'9'] do
              begin
                current_string:=current_string+look;
                GetChar;
              end;
              if (Look='.') or (upcase(Look)='E') then
                if (Ahead='.') then 
                  begin {subrange, like: 1..100}
                    val(current_string, current_Number, code);
                    current_token := _integer_constant;
                  end
                else                
                  begin
                    if (upcase(Look) in ['.','0'..'9','E','-']) then
                      begin {real constant: 3. or 2.0}
	                  current_string:=current_string+look;
                        GetChar;
                        while (upcase(Look) in ['0'..'9','E','-']) do
                        begin
                          current_string:=current_string+look;
                          GetChar;
                        end;
                        val(current_string, current_float, code);
                        current_token := _real_constant;
                      end;
                  end
              else begin
                     val(current_string,Current_Number,code);
                     current_token := _integer_constant;
                   end;
            end;

  '_',
  'A'..'Z',
  'a'..'z'  : begin {identifier}
                while Look in ['_', '0'..'9','A'..'Z','a'..'z' ] do
                  begin
                    Current_String := Current_String + upcase(Look);
                    GetChar;
                  end;
                for i := 0 to MaxToken do
                   if Current_String = TokenName[token(i)] then
                     begin
                       Current_Token := Token(i);
                     end;
                if Current_Token=_unknown then Current_Token:=_name;
              end;
  else        Current_String := upcase(Look);
              GetChar;
              repeat
                J := 0;
                for i := 0 to MaxToken do
                   if (Current_string + upcase(Look)) =
	                 TokenName[token(i)] then J := i;
                    if J <> 0 then 
                      begin
                        Current_String := Current_String + upcase(Look);
                        GetChar;
                      end;
              until J = 0;
              for i := 0 to MaxToken do
                 if Current_String = TokenName[token(i)] then J := i;
              Current_Token := Token(j);
    end;
end;

This is a simplified example but it should give you an idea of the working of 
the scanner. It's the easiest part of the compiler, but very important. 
Because without it the compiler won't work...

Reference: P32_SCAN.PAS

3.2.3. Pre-processor
The compiler is equipped with a simple pre-processor. Most compiler have a 
pre-processor that analyse the source code before it's scanned. P32 has 
combines the scanner and pre-processor. The pre-processor is in this compiler 
uses for analysing compiler directives. This can be a compiler specific 
option, like $A+ to turn on data alignment or a directive for conditional 
compiling.
For conditional compiling the following directives are supported:
$ifdef, $else, $endif. In the following example you can see that the strings 
that control the compilation of source code are pushed on a virtual stack.

case current_directive of
  _ifdef: begin
            GetToken;
            b:=FindStringName(directive_names, current_string);
            IfPush(b);
            if b=FALSE then EatCode;
          end;
  _else : if IfInverted then LineError(LineCount, 'Error with $ELSE')
          else IfInvert;
  _endif: if IfEmpty then LineError(LineCount, 'Error with $ENDIF')
          else IfPop;
end;

This small routine controls the conditional compiling. See the source code for 
more information.

Reference: P32_PREP.PAS

3.3. Parser

3.4. Optimizer

3.5. Code generator

<more to come>


Appendix A. P32 unit format

The unit header is 32 bytes long, it uses the following structure:

unitstart : record
              id       : array[1..3] of char;	{ P32 }
              brk      : char;			{ #26 }
              version  : word;			{ version number }
              code     : longint;
              symbols  : word;                  { # of symbols }
              types    : word;                  { # of types }
              comp     : boolean;               { is compression enabled? }
              comptype : byte;                  { cur: 0 = none, 1 = rle }
              reserved : array[1..16] of char;
            end;

The unit saving routines use RLE compression to minimize diskspace usage. The
RLE compression is built up like this:

? = Value set as needed.

 Part         Size       Offset       Value
----------------------------------------------
 Header        32          0            ?
  * id          3          0          'P32'
  * brk         1          3           #26
  * version     2          4            6
  * code        4          6            ?
  * symbols     2         10            ?
  * types       2         12            ?
  * comp        1         14          TRUE (or FALSE)
  * comptype    1         15      Current: 0 = none, 1 = rle
  * reserved   16         16      Current: #0#0#0#0...#0#0
 RLEHeader      8         32            ?
  * packs       2         32            ?
  * packsize    2         34            ?
  * reserved    4         36      Current: #0#0#0#0
 RLEDatas       8         40            ?
  * datatype    2         40            ?
  * datacode    1         42            ?
  * datasize    2         43            ?
  * reserved    3         45      Current: #0
    * Data      ?         48            ?
 RLEData        8          ?            ?
  ... x packs


With this RLE compression method, maximum size of an unit file, is 2^32 (4 GB).
Though version 6 uses a packsize of 8192, which limits the size of an unitfile 
to 536870912 bytes. (512 MB)

RLEData.datatype is a word describing the RLE compression method
used. Current implemented:
  0 - Store (no compression at all, just read next RLEPackSize bytes)
  1 - RLE   (usual RLE 8 bit compression)

RLEData.datacode is the escape sequence to signal that something
                 special is coming

DATA Decompression examples:

Magic byte is $ff

Byte in:
 $55 $66 $77 $88
Byte out:
 $55 $66 $77 $88

Byte in:
 $55 $ff $06 $88 $99   (magic,count,value)
Byte out:
 $55 $88 $88 $88 $88 $88 $88 $99

Byte in:
 $ff $00 $66    (magic, count = 0, means store magic)
Byte out:
 $ff $66

Byte in:
 ... (pos RLEPAckSize-1) $ff
Byte out:
 $ff


Appendix B. Assemblers and DOS extenders

The following table shows that assembler and
DOS-extender combinations that are supported
by P32 v0.4.

              -----------------------
             | DOS32 | WDOSX | PRO32 |
    |--------------------------------|
    | TASM   |   x   |   x   |   -   |
    |--------------------------------|
    | NASM   |   ?   |   ?   |   -   |
    |--------------------------------|
    | PASS32 |   -   |   x   |   ?   |
    |--------------------------------|

x supported
- not supported
o under development
? not sure or partially supported

   Note: NASM is not fully supported because it contains some bugs that 
         prevends P32 from using it.

Target: TASM/DOS32
------------------
Required files:
TASM    .EXE  (v3.1 or better)
DLINK   .EXE  (v1.3 or better)
DOS32   .EXE  (v3.3 or better)

TASM.EXE /m3 /t /uT310 <name>

   Note: Repeat this for all units.

DLINK.EXE -t -p <name> <units>

   Note: The executable requires DOS32.EXE to run.

Where to get it:
TASM.EXE from Borland (e.g. BP7 package)
DLINK.EXE and DOS32.EXE from the DOS32V33.ZIP

Debugging:
Use DEBUG.LIB from DOS32V33.ZIP to debug DOS32 executables, you need to add
a line 'call debug' to the main source and you need to add ',,,debug.lib' to
the linker commandline.

Target: TASM/WDOSX
------------------
Required files:
TASM    .EXE  (v3.1 or better)
TLINK32 .EXE
STUBIT  .EXE  (v0.94 or better)

TASM.EXE /m3 /t /uT310 <name>

   Note: Repeat this for all units.

TLINK32.EXE <name> <units>

STUBIT.EXE <result.exe>

Where to get it:
TASM.EXE from Borland (e.g. BP7 package)
TLINK32.EXE from Borland (e.g. TASM 4.0 package)
STUBIT.EXE from WDOSX094.ZIP

Debugging:
Use WUDEBUG.EXE from WDOSX094.ZIP to debug WDOSX executables. you can start
it with WUDEBUG <result.exe>

Target: PASS32/WDOSX
--------------------
Required files:
PASS32  .EXE (v2.1 or better)
WDOSX   .DX  (v0.94 or better)

PASS32.EXE <name> -o -im:<unit>

   Note: PASS32 supports 'smart-linking'!

Where to get it:
PASS32.EXE from PASS32V2.ZIP
WDOSX.DX from STUBIT.EXE (WDOSX094.ZIP), use -extract options to get it.
(you can also download PASSWDX.ZIP from the P32 homepage)

Debugging:
Use WUDEBUG.EXE from WDOSX094.ZIP to debug WDOSX executables. you can start
it with WUDEBUG <result.exe>