metthal/IFJ-Projekt

Syntax/semantic analysis (parser)

Closed this issue · 11 comments

Implementation of parser simulating derivation tree and generation of TAC (Three Address Code).

  1. Use scanner interface to get token
  2. Using recursive descent, simulate LL grammar rules based on LL table
  3. Generate instructions using TAC and store them in some kind of dynamic data structure (will be added later)

Operator precedence will have its own PR, it is not subject of this one.

Uh, I need some verification if this looks good or it's completely wrong, because I fear it's completely wrong :D

Stack related:

typedef enum { 
    VT_Integer,
    VT_Double,
    VT_String,
    VT_Bool,
    VT_Null,
} VariableType;


typedef union {
    int32_t i;
    double d;
    int8_t b; // Couldn't we just put boolean into i?
    String s;
} VariableValue;

// These ones are in the virtual stack
typedef struct {
    VariableValue value;
    VariableType type;
} Variable;

Symbol Table related:

typedef struct {
    uint32_t defaultValueIndex; // Index to Constant Table
    uint32_t relativeIndex; // Index to Stack
} TableVariable; // Change name to something better

// These should be stored in separate vector
typedef struct {
    uint32_t relativeIndex; // Index to Address Table
    uint16_t argc; // Number of arguments
    uint16_t localc; // Number of local variables
    SymbolTable *localSymbolTable; 
} Function;

typedef enum {
    ST_Function,
    ST_Variable,
} SymbolType;

typedef union {
    TableVariable var;
    Function func;
} SymbolData;

typedef struct {
    SymbolData *data;
    SymbolType type;
    String key;
} Symbol;

It's quite good but first, SymbolData should not store variable and there's currently no useful data for variables that should be stored in local symbol tables. Each symbol should have "index" variable of integer type, that will be used as relative offset to stack pointer for variables or as index to adress table for functions. Function therefore don't need a pointer to instruction as it will be stored in this adress table. On the other hand Function should have pointer to local symbol table of that function.

@BetaRavener I thought that SymbolData (check again, it's union) pointer will be a bit more powerful since it can be used both as relative/absolute pointer, but if you want uint32_t why not..

SymbolData* is alright in Symbol, I was telling you that:

  • Function should not contain Instruction*
  • SymbolData should not contain Variable
  • Symbol should contain "index" variable (uint32_t).

@BetaRavener Please, can you give us any insight (here or on wiki) on how you imagine the parser and instruction generation to work as it seems I am lost here right now, because I don't know what is needed and how it is going to work. Thanks.

After today's discussion there are some corrections:

  • SymbolData should not contain Variable
  • Function should contain "index" variable (uint32_t) instead of Instruction pointer.

Don't change anything yet though, at last Variable might be used, but it depends on final parser setup.

@ptrllama Updated your code to what it should be by above parser description. The default argument value is now stored in Constant Table, where each default value have it's record. Reference (by index) to this Constant Table item is stored in Variable's defaultValueIndex. Each argument will have Variable record in LST so this is ok.

This issue still open for a Freeze milestone, does it need to be though parser need some optimization? If you want an issue for parser optimization, raise new one and close this and for the new one, mark milestone as Deadline. Thanks.

Closed as parsers are finished and further changes will be considered optimizations or bugfixes.