/cpd-antlr

Adapter to allow use of ANTLR v4 lexer grammars with PMD-CPD code duplication detection

Primary LanguageGroovyApache License 2.0Apache-2.0

cpd-antlr

An adapter to allow use of ANTLR v4 lexer grammars with PMD-CPD code duplication detection.

Note: This project and its author are not affiliated with the PMD or ANTLR projects.

What is this?

CPD-ANTLR provides an adapter between two great projects.

CPD-ANTLR downloads the ANTLR sample grammars, compiles them, and generates the necessary Tokenizer and Language classes to make the lexer classes available to CPD.

Usage

This code seems to be working now.

To build, run gradle build. This builds a fat jar at build/libs/cpd-antlr.jar containing CPD-ANTLR as well as the compiled ANTLR grammars and all necessary dependencies. Then simply put this jar in your CLASSPATH environment variable and run CPD using the new language.

export CLASSPATH=/path/to/cpd-antlr/build/libs/cpd-antlr.jar
pmd-bin/bin/run.sh cpd --files /path/to/R/code/ --minimum-tokens 100 --language r

If you view the CPD help with pmd-bin/bin/run.sh cpd -h, you'll see this delightful list of supported languages:

Supported languages: [csharp4, scss, csv, arithmetic, vhdl, ecmascript, plsql,
webidl, java, mumath, xml, atl, logo, html, mysql, bnf, cobol85, verilog2001,
gff3, matlab, calculator, cpp14, cpp, python, sqlite, turtle, visualbasic6,
stg, less, idl, mps, abnf, stacktrace, cobol85preprocessor, suokif, tnt,
mumps, tnsnames, fol, jvmbasic, scala, clojure, jsp, pascal, dot, erlang,
fasta, pcre, xpath, java8, pgn, fortran, icalendar, json, graphql, iri, st,
c, clif, go, objectivec, antlrv4, asm6502, ruby, cs, memcached_protocol,
redcode, r, propcalc, creole, lua, php, asm, peoplecode, sparql, asn,
smalltalk, objc, ucblogo]

Using your own ANTLR grammar

To use your own ANTLR grammar with CPD:

  1. Create a Tokenizer class. Extend the AbstractANTLRTokenizer class and initialize the lexer variable with your own lexer in the constructor. An example of how to do this may be seen in TokenizerGenerator.

  2. Create a Language class. This class specifies the name of the language, the Tokenizer class to split the code up into tokens, and the file extensions to look for. See LanguageGenerator for an example.

  3. Make sure a line with your fully-qualified Language class name ends up in META-INF/services/net.sourceforge.pmd.cpd.Language in the compiled jar file. PMD uses ServiceLoader to discover other langues from jars. The service list in this project is automatically generated within build.gradle.

For others who wish to add languages to PMD-CPD, an important tip is to make sure not to include the PMD code in your fat jar, as this may impair service loading.

Contributions

Pull requests are welcome. One area that could use improvement is the list of file extensions in LanguageGenerator.

Licensing Notes

Note that the ANTLR sample grammars do not have a common license. Be careful when packaging and distributing that code. CPD-ANTLR is licensed under the Apache License, Version 2.0.