ProLeap ANTLR4-based parser for COBOL
This is a COBOL parser based on an ANTLR4 grammar, which generates an Abstract Syntax Tree (AST) and Abstract Semantic Graph (ASG) for COBOL code. The AST represents plain COBOL source code in a syntax tree structure. The ASG is generated from the AST by semantic analysis and provides data and control flow information (e. g. variable access). EXEC SQL, EXEC SQLIMS and EXEC CICS statements are extracted as texts.
The parser is developed test-driven, passes the NIST test suite and has successfully been applied to numerous COBOL files from banking and insurance.
Example
Input: COBOL code
Identification Division.
Program-ID.
HELLOWORLD.
Procedure Division.
Display "Hello world".
STOP RUN.
Output: Abstract Syntax Tree (AST)
(startRule
(compilationUnit
(programUnit
(identificationDivision Identification Division .
(programIdParagraph Program-ID .
(programName
(cobolWord HELLOWORLD)) .))
(procedureDivision Procedure Division .
(procedureDivisionBody
(paragraphs
(sentence
(statement
(displayStatement Display
(displayOperand
(literal "Hello world")))) .)
(sentence
(statement
(stopStatement STOP RUN))) .)))))) <EOF>)
Getting started
To include the parser in your Maven project build it and add the dependency:
<dependency>
<groupId>io.github.uwol</groupId>
<artifactId>proleap-cobol-parser</artifactId>
<version>4.0.0</version>
</dependency>
Use the following code as a starting point for developing own code.
Simple: Generate an Abstract Semantic Graph (ASG) from COBOL code
import java.io.File;
import io.proleap.cobol.asg.metamodel.Program;
import io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl;
import io.proleap.cobol.preprocessor.CobolPreprocessor.CobolSourceFormatEnum;
import io.proleap.cobol.asg.metamodel.CompilationUnit;
import io.proleap.cobol.asg.metamodel.ProgramUnit;
import io.proleap.cobol.asg.metamodel.data.DataDivision;
import io.proleap.cobol.asg.metamodel.data.datadescription.DataDescriptionEntry;
// generate ASG from plain COBOL code
File inputFile = new File("src/test/resources/io/proleap/cobol/asg/HelloWorld.cbl");
CobolSourceFormatEnum format = CobolSourceFormatEnum.TANDEM;
Program program = new CobolParserRunnerImpl().analyzeFile(inputFile, format);
// navigate on ASG
CompilationUnit compilationUnit = program.getCompilationUnit("HelloWorld");
ProgramUnit programUnit = compilationUnit.getProgramUnit();
DataDivision dataDivision = programUnit.getDataDivision();
DataDescriptionEntry dataDescriptionEntry = dataDivision.getWorkingStorageSection().getDataDescriptionEntry("ITEMS");
Integer levelNumber = dataDescriptionEntry.getLevelNumber();
Complex: Generate an Abstract Semantic Graph (ASG) and traverse the Abstract Syntax Tree (AST)
import java.io.File;
import io.proleap.cobol.asg.metamodel.Program;
import io.proleap.cobol.asg.runner.impl.CobolParserRunnerImpl;
import io.proleap.cobol.preprocessor.CobolPreprocessor.CobolSourceFormatEnum;
import io.proleap.cobol.CobolBaseVisitor;
import io.proleap.cobol.CobolParser.DataDescriptionEntryFormat1Context;
import io.proleap.cobol.asg.metamodel.CompilationUnit;
import io.proleap.cobol.asg.metamodel.data.datadescription.DataDescriptionEntry;
// generate ASG from plain COBOL code
File inputFile = new File("src/test/resources/io/proleap/cobol/asg/HelloWorld.cbl");
CobolSourceFormatEnum format = CobolSourceFormatEnum.TANDEM;
Program program = new CobolParserRunnerImpl().analyzeFile(inputFile, format);
// traverse the AST
CobolBaseVisitor<Boolean> visitor = new CobolBaseVisitor<Boolean>() {
@Override
public Boolean visitDataDescriptionEntryFormat1(final DataDescriptionEntryFormat1Context ctx) {
DataDescriptionEntry entry = (DataDescriptionEntry) program.getASGElementRegistry().getASGElement(ctx);
String name = entry.getName();
return visitChildren(ctx);
}
};
for (final CompilationUnit compilationUnit : program.getCompilationUnits()) {
visitor.visit(compilationUnit.getCtx());
}
Where to look next
How to cite
Please cite ProLeap COBOL parser in your publications, if it helps your research. Here is an example BibTeX entry:
@misc{wolffgang2018cobol,
title={ProLeap COBOL parser},
author={Wolffgang, Ulrich and others},
year={2018},
howpublished={\url{https://github.com/uwol/proleap-cobol-parser}},
}
Features
EXEC SQL
statements,EXEC SQLIMS
statements andEXEC CICS
statements are extracted by the preprocessor and provided as texts in the ASG.- Passes the NIST test suite.
- Rigorous test-driven development.
- To be used in conjunction with the provided preprocessor, which executes
COPY
,REPLACE
,CBL
andPROCESS
statements.
Build process
The build process is based on Maven (version 3 or higher). Building requires a JDK 11 and generates a Maven JAR, which can be used in other Maven projects as a dependency.
- Clone or download the repository.
- In Eclipse import the directory as a an
existing Maven project
. - To build, run:
$ mvn clean package
- The test suite executes AST and ASG tests against COBOL test code and NIST test files. NIST test files come from Koopa repo. Unit tests and parse tree files were generated by class
io.proleap.cobol.TestGenerator
from COBOL test files. The generator derives the COBOL line format from the containing folder name. - You should see output like this:
[INFO] Scanning for projects...
...
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running io.proleap.cobol.ast.fixed.FixedTest
Preprocessing file Fixed.cbl.
Parsing file Fixed.cbl.
Comparing parse tree with file Fixed.cbl.tree.
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.202 sec
Running io.proleap.cobol.ast.fixed.QuotesInCommentEntryTest
...
Results :
Tests run: 680, Failures: 0, Errors: 0, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
- To install the JAR in your local Maven repository:
$ mvn clean install
- To only run the tests:
$ mvn clean test
Release process
- Milestones of the grammar are published in the ANTLR grammars repo.
License
Licensed under the MIT License. See LICENSE for details.