/antlr4-ace-ext

Tokenizer for ACE editor to do syntax highlighting using an ANTLR4 lexer.

Primary LanguageJavaScript

Build Status

Tokenizer for ACE editor to do syntax highlighting using an ANTLR4 lexer.

How to install

Use bower to install:

bower install --save antlr4-ace-ext

You can install ACE editor from bower, too:

bower install --save ace-builds

How to use

After ace is loaded

<script src="bower_components/ace-builds/src-noconflict/ace.js"></script>

add scripts:

<script src="bower_components/antlr4-ace-ext/src/token-type-map.js"></script>
<script src="bower_components/antlr4-ace-ext/src/tokenizer.js"></script>

They register themselves as ACE modules ace/ext/antlr4/tokenizer and ace/ext/antlr4/token-type-map. You can require them in your mode:

ace.define(
  'ace/mode/my-mode',
  [
    "require",
    "exports",
    "module",
    "ace/ext/antlr4/tokenizer",
    "ace/ext/antlr4/token-type-map"
  ],
  function(require, exports, module) {
    var createTokenTypeMap = require('ace/ext/antlr4/token-type-map').createTokenTypeMap;
    var Antlr4Tokenizer = require('ace/ext/antlr4/tokenizer').Antlr4Tokenizer;
    // ...
  }
}

Override the getTokenizer method of your mode class to use you custom tokenizer:

MyMode.prototype.getTokenizer = function() {
  if (!this.$tokenizer) {
    this.$tokenizer = new Antlr4Tokenizer(MyLanguageLexer, antlrTokenNameToAceTokenType);
  }
  return this.$tokenizer;
};

The Antlr4Tokenizer constructor takes an lexer class generated by ANTLR4 and a mapping of ANTLR4 token names to ACE token types. The mapping describes which ANTLR4 token name refers to which ACE token type (see common ACE tokens).

{
  "'+'": 'keyword.operator',
  "'-'": 'keyword.operator',
  "'return'": 'keyword.control',
  "ID": 'identifier',
  "INT": 'constant.numeric'
}

You can use the helper function createTokenTypeMap to create a token type map for your Antlr4Tokenizer:

var antlrTokenNameToAceTokenType = createTokenTypeMap({
  literals: {
    'keyword.operator': ['+', '-'],
    'keyword.control': 'return'
  },
  symbols: {
    'identifier': 'ID',
    'constant.numeric': 'INT'
  }
});

Thereby, you do not have to quote literal token names and you can map multiple token names as array to the same ACE token type.

Example

See the browser example of the Cymbol language (Demo).

6.4 Parsing Cymbol

To demonstrate how to parse a programming language with syntax derived from C, we’re going to build a grammar for a language I conjured up called Cymbol. Cymbol is a simple non-object-oriented programming language that looks like C without struct s.

from The Definitive ANTLR 4 Reference

How to build

Required

  • Node.JS
  • ANTLR4 (antlr4 has to be available as environment variable to (re-) build grammar files)

Build Instructions

  1. Install dependencies: npm install
  2. Build project: npm run build
  3. Run tests: npm test