Parser Generator

This is a simple parser generator written in C #, the program is divided in 3 phases lexical analysis, syntactic analysis and the generation of the scanner.

The scanner generated by the program verifies if the text that is entered corresponds to the grammar that was entered.

Grammar Explanation

The grammar consists in 4 sections:

SETS

Contains the abbreviated definition of a group of terminal symbols, this part may or may not come within the file, but if it appears it must have at least one SET. The section must meet the following characteristics:

The word SETS must be capitalized.
Sets can be concatenated through the "+" sign.
You can use the CHR function.
There may be many blanks between the identifier, the "=" symbol, and the definition.
There may be several line breaks between one SET and another.

TOKENS

The tokens represent the terminal and nonterminal symbols of the grammar, in this phase we do not care if an identifier has been declared or not in the SETS. The section must meet the following characteristics:

The word TOKENS must exist and be capitalized.
This section must exist.
Each token must have the word: TOKEN and a number, followed by the equal sign "=".
After the equal sign must come a regular expression, which can be one or more characters (enclosed in apostrophes).
The signs used for regular expression operations are the only ones that do not need to be enclosed in quotation marks, unless you want to denote their use as a terminal sign.
The signs of operations for regular expressions are: + *? () |

ACTIONS

The word ACTIONS contains definition of functions, in this specific case the reserved words of the language, it is important that the function: RESERVADAS() must always exist and there can be other functions. The sections must meet the following characteristics:

The word ACTIONS must always be accompanied by the function RESERVADAS().
All functions must have an identifier and open and closed parentheses.
The functions described in ACTIONS must start and end with curly braces {}.
The tokens inside are made up of: number, equal sign and then the identifier between apostrophes

ERRORS

The definition of errors must come at least one, the ERROR must be assigned a number, and the identifier must have as suffix the word ERROR in capital letter. The identifiers should only have letters, and on the right side of the equal symbol, there can only be numbers.

Example File

SETS
	LETRA   = 'A'..'Z'+'a'..'z'+'_'
	DIGITO  = '0'..'9'
	CHARSET = CHR(32)..CHR(254)
TOKENS
	TOKEN 1 = DIGITO
	TOKEN 2 ='"' CHARSET '"'|''' CHARSET '''
	TOKEN 4 = '='
	TOKEN 3= LETRA ( LETRA | DIGITO )*  { RESERVADAS() }
	TOKEN 5 = DIGITO DIGITO * { VALUES() }

ACTIONS
RESERVADAS()
{
	18 = 'PROGRAM'
	19 = 'INCLUDE'
	20 = 'CONST'
	21 = 'TYPE'
	22 = 'VAR'
	23 = 'RECORD'
	24 = 'ARRAY'
	25 = 'OF'
	26 = 'PROCEDURE'
	27 = 'FUNCTION'
	28 = 'IF'
	29 = 'THEN'
	30 = 'ELSE'
	31 = 'FOR'
	32 = 'TO'
	33 = 'WHILE'
	34 = 'DO'
	35 = 'EXIT'
	36 = 'END'
	37 = 'CASE'
	38 = 'BREAK'
	39 = 'DOWNTO'
}

VALUES()
{
	6 = '1234'
}

ERROR = 54