/jison

bison / YACC / LEX in JavaScript (LALR(1), SLR(1), etc. lexer/parser generator)

Primary LanguageJavaScriptMIT LicenseMIT

Jison

Join the chat at https://gitter.im/jison-parsers-lexers/Lobby Build Status NPM version Dependency Status npm Average time to resolve an issue Percentage of issues still open

Notice

This repository contains a fork maintained by GerHobbelt. The original JISON work has been done by Zachary Carter and is available in zaach/jison.

For an overview of all changes (fixes and features), see the section What's New or Different? further below. See also pullreq #338.

An API for creating parsers in JavaScript

Jison generates bottom-up parsers in JavaScript. Its API is similar to Bison's, hence the name. It supports many of Bison's major features, plus some of its own. If you are new to parser generators such as Bison, and Context-free Grammars in general, a good introduction is found in the Bison manual. If you already know Bison, Jison should be easy to pickup.

Briefly, Jison takes a JSON encoded grammar or Bison style grammar and outputs a JavaScript file capable of parsing the language described by that grammar. You can then use the generated script to parse inputs and accept, reject, or perform actions based on the input.

Installation

Jison can be installed for Node using npm

Using npm:

npm install jison-gho -g

Usage from the command line

Clone the github repository for examples:

git clone git://github.com/GerHobbelt/jison.git
cd jison/examples

Now you're ready to generate some parsers:

jison calculator.jison

This will generate calculator.js in your current working directory. This file can be used to parse an input file, like so:

echo "2^32 / 1024" > testcalc
node calculator.js testcalc

This will print out 4194304.

Full cli option list:

Usage: jison [file] [lexfile] [options]

file        file containing a grammar
lexfile     file containing a lexical grammar

Where the available options are:

: -j, --json force jison to expect a grammar in JSON format [false]

: -o FILE, --outfile FILE Filepath and base module name of the generated parser; when terminated with a / (dir separator) it is treated as the destination directory where the generated output will be stored

: -t, --debug Debug mode [false]

: -I, --info Report some statistics about the generated parser [false]

: -m TYPE, --module-type TYPE The type of module to generate (commonjs, amd, es, js) or an alias (cjs=commonjs, umd=amd and iffe=js) [commonjs]

: -n NAME, --module-name NAME The name of the generated parser object, namespace supported. This has no effect on amd/umd or es modules.

: -p TYPE, --parser-type TYPE The type of algorithm to use for the parser (lr0, slr, lalr, lr, ll) [lalr]

: -c, --compress-tables Output compressed parser tables in generated modules (0 = no compression, 1 = default compression, 2 = deep compression) [2]

: -T, --output-debug-tables Output extra parser tables (rules list + look-ahead analysis) in generated modules to assist debugging / diagnostics purposes [false]

: -X, --no-default-resolve Act another way when a conflict is found in the grammar [false]

: --default-action=[for-values,for-locations] Generate a parser which does NOT include the default "$$ = $1" action for every rule. This produces a slightly faster parser but now you are solely reponsible for propagating rule action "$$" results. [false]

: --no-try-catch Generate a parser which does NOT try/catch exceptions (from the grammar action code or parseError error reporting calls. This produces a slightly faster parser at the cost of enhanced code safety. [false]

: -Q, --error-recovery-token-discard-count Set the number of lexed tokens that may be gobbled by an error recovery process before we cry wolf (default: 3) [3]

: -E, --export-all-tables Next to producing a grammar source file, also export the symbols, terminals, grammar and parse tables to separate JSON files for further use by other tools. The files' names will be derived from the outputFile name by appending a suffix. [false]

: -x, --main Include .main() entry point in generated commonjs module [false]

: -y NAME, --module-main NAME The module exports NAME as exports.main (module type commonjs or cjs) or as yymain (module type es). This option has no effect with module type amd or umd. It only has an effect when used with -x, though it does not (contrary to possible expectations) rename the main function; it simply elides the creation of a main and exports NAME as a main.

: -V, --version print version and exit

Usage as a CommonJS module

You can generate parsers programmatically from JavaScript as well. Assuming Jison is in your CommonJS environment's load path:

// mygenerator.js
var Parser = require("jison").Parser;

// a grammar in JSON
var grammar = {
    "lex": {
        "rules": [
           ["\\s+", "/* skip whitespace */"],
           ["[a-f0-9]+", "return 'HEX';"]
        ]
    },

    "bnf": {
        "hex_strings" :[ "hex_strings HEX",
                         "HEX" ]
    }
};

// `grammar` can also be a string that uses jison's grammar format
var parser = new Parser(grammar);

// generate source, ready to be written to disk
var parserSource = parser.generate();

// you can also use the parser directly from memory

// returns true
parser.parse("adfe34bc e82a");

// throws lexical error
parser.parse("adfe34bc zxg");

Differences in module types

Jison allows you to emit these module types: (commonjs/cjs, amd/umd, es, js/iffe). In the following sections, <parser> represents the parser code common to all types of module.

cjs/commonjs

The parser is wrapped in:

var \<module-name\> = (function () {
  \<parser\>
  return new Parser();
})();
if (typeof require !== 'undefined' && typeof exports !== 'undefined') {
  exports.parser = \<module-name\>;
  exports.Parser = \<module-name\>.Parser;
  exports.parse = function () {
    return \<module-name\>.parse.apply(\<module-name\>, arguments);
  };
}

The --main function is declared with:

exports.main = function (args) {
  ...
}

amd/umd

The parser is wrapped with:

define(function (require) {
  \<parser\>
  return parser;
});

The --module-name NAME option has no effect if the type is amd or umd.

js/iffe

The parser is wrapped with:

var <module-name> = (function () {
  \<parser\>
  function Parser() {
    this.yy = {};
  }
  Parser.prototype = parser;
  parser.Parser = Parser;

  return new Parser();
})();

es

The parser is appended with:

\<parser\>
function yyparse() {
    return parser.parse.apply(parser, arguments);
}
export default {
    parser,
    Parser,
    parse: yyparse,
};

The --module-name NAME option has no effect if the type is es. The --main function is declared with:

var yymain = function (args) {
  ...
}

More Documentation

For more information on creating grammars and using the generated parsers, read the documentation.

How to contribute

See CONTRIBUTING.md for contribution guidelines, how to run the tests, etc.

Projects using Jison

View them on the wiki, or add your own.

Submodules for Jison

The JISON tool uses several modules:

  • The ebnf-parser library parses BNF and EBNF grammars to a basic AST used by Jison to produce a parser engine for your grammar spec.
  • The lex-parser library parses %lex ... /lex lexical grammars to a basic AST used by Jison to produce a parser engine for your grammar spec.
  • The jison-lex library/utility generates lexical analyzers which are included by Jison in your parser run-time engine to lex the input according to your %lex ... /lex lexical grammar definition.
  • The jison2json utility converts a Jison spec file to JSON format file.
  • The json2jison utility converts a JSON format file to a Jison spec file.

Contributors

Githubbers

Special thanks to Jarred Ligatti, Manuel E. Bermúdez

What's New or Different?

Here's a comprehensive list of features and fixes compared to the original:

  • Full Unicode support: the lexer can handle all Unicode regexes which are supported by the XRegExp library, with a few notes:

    • your own software does not need to include the XRegExp library: jison will produce standard JavaScript regex expressions for every lexer rule so that you can enjoy most Unicode features without the added burden of another library (XRegExp)

    • astral Unicode codepoints are not fully supported within regex character set expressions, unless you yourself include XRegExp and instruct the lexer to produce XRegExp regex expressions via the lexer option %options xregexp

  • EBNF LR/LALR/SLR/LR0 grammars are correctly rewritten to BNF grammars, allowing your action code blocks to access all elements of the grammar rule at hand. See also the wiki section about EBNF.

  • Parser engine optimization: jison analyzes not just your grammar, but also your action code and will strip any feature you don't use (such as location tracking via @element references and yylloc) from the parser kernel, which will benefit your parser run-time performance. The fastest parsers are obtained when you do not include error recovery (error tokens in your grammar), nor any lexer location tracking: this can potentially result in run-time execution cost reductions of over 70% (hence your parser executes more than 3 times as fast)!

  • generated grammar / lexer source files carry a full API and internals documentation in the code comments to help you to read and debug a grammar. For example, every grammar rule is printed above its action code so that stepping through the parser when debugging hard-to-find problems makes it quite obvious which rule the engine is currently 'reducing'.

  • Generated parsers and lexers are JavaScript strict mode compliant.

  • you can specify a totally custom lexer in the %lex ... /lex section of your grammar definition file if you like, i.e. you can define and use a lexer which is not regex ruleset based / generated by jison lex! This is particularly handy when you want to achieve maximum performance / absolute minimum parse and lexing overhead for your high-performance demand grammars.

  • lexer.reject() et al: the lexer comes with extra APIs to help you write more sophisticated lexers based on the lex/jison mechanism. The this.reject() call in your lexer rule action code will reject the current match and continue down the lexer rule set to find another match. Very handy when you do not use flex mode matching all the time, but want specific, local, control over when a lexer regex (a.k.a. lexer rule) actually is a correct match.

  • You can now enter epsilon as a token in your grammar rules, so no more hacks like /* epsilon */ comments for empty rules: you can type any of these:

    • %epsilon,
    • \u0190
    • \u025B
    • \u03B5
    • \u03F5

    (See also https://en.wikipedia.org/wiki/Epsilon#Glyph_variants)

  • %options easy_keyword_rules: see also https://github.com/zaach/jison/wiki/Deviations-From-Flex-Bison#user-content-literal-tokens

  • ... more lexer features ...

    • %options ...

    • kernel ...

  • ... more parser features ...

    • configurable error recovery search depth (default: 3 tokens)

    • augmented error reporting callbacks

    • dedicated parser and lexer Error-derived exception classes so you can use instanceof to help your generic error code discern what type of error has occurred and what info is available next to the text message itself.

    • (are we faster even when we run with the same feature set as 'vanilla' zaach jison? Probably a little bit, but haven't measured this thoroughly.)

    • JSON (rather than JISON) grammar files support all JSON5 features, i.e. you can include comments, etc. in your JSON-file based grammar specs!

License

Copyright (c) 2009-2016 Zachary Carter

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.