Flattened Abstract Syntax Trees

This repository provides a demonstration of the deep learning package for classifying the code parsed by the fast utility. See also the Visual Studio Code Extension.

You can run fast in your own machine as the docker container of course, but here you don't even need that: all the binary and python dependencies have been provided, including also the trained models and the pre-trained embeddings.

To reproduce the results, all you need is to enable the GitPod app to access your GitHub account so that the commands can run on a remote server belonging to yourself.

Use of flattened Abstract Syntax Trees in Deep Learning on your own GitPod server

Usage of fAST in Deep Learning for Algorithm Classification

Examples of algorithms in Java and C++ are provided to test the algorithm classification deep learning tool. Once your gitpod machine is running, it will launch the following command:

run.sh datasets/github_java_10/4/1.java

Looks like Tensorflow 1.15 is no longer supported by default. You need to set up an older python environment that is compatible with this older version.

You will see the predicted probabilistic distribution of the class labels: the correctly classified label will be shown in blue, and the misclassified label will be shown in red.

To understand why, click at the HTML file "datasets/github_java_10/4/1.html" and use the Preview button on the up-right corner of the tab to see visualisation results in a split pane. The colours on the tokens indicate which parts of the code that have got the most attention by the classification algorithm.

To run another example, type:

run.sh datasets/github_java_10/4/3.java
run.sh datasets/github_cs_10/4/1.cs
run.sh datasets/github_cpp_10/4/1.cpp

In these examples, it shows that even though the model was trained using Java programs, when applying it to other programming languages such as C# or C++, it normally works well too. We call this feature "Cross-Language Algorithm Classification" [Bui et al SANER'19].

Usage of the fAST utility

cd datasets

# print the command line options and arguments
fast
# convert a C++ code into protobuffer representation
fast tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc tensorflow-1.0.1/tensorflow/cc/saved_model/loader_test.cc.pb
# convert a Java code into flatbuffers representation
fast RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java.fbs
# convert a flatbuffers representation back to C#
fast corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs.fbs corefx-1.0.4/src/System.IO.IsolatedStorage/ref/System.IO.IsolatedStorage.cs
# slice a program
fast -S -G RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests.java RxJava-1.2.9/src/test/java/rx/ErrorHandlingTests-ggnn.fbs
# diff two programs
fast -D github_java_10/4/1.java github_java_10/4/3.java

Usage of fAST in Bug Localisation

cd usr/bin

java -cp /workspace/demo/usr/config:/workspace/demo/usr/config/lic:/workspace/demo/usr/lib/ConCodeSe-1.0.0.jar com.concodese.ConCodeSeJettyServerStarter SERVER_PORT=8081

You can call fAST anywhere when you have docker installed:

alias fast=”docker run -v $PWD:/e yijun/fast”

Reference and Applications

Yijun Yu. "fAST: Flattening Abstract Syntax Trees for Efficiency". In: 41st ACM/IEEE International Conference on Software Engineering, 25-31 May 2019, Montreal, Canada, ACM and IEEE. demo, paper, poster

Deep Learning

Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Learning Cross-Language API Mappings with Little Knowledge", In the 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Tallinn, Estonia, 26-30 August 2019.

Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang. "Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification", In the 26th edition of the IEEE International Conference on Software Analysis, Evolution and Reengineering, Research Track, Hangzhou, China, February 24-27, 2019. GGNN, DTBCNN

Nghi D. Q. Bui, Lingxiao Jiang, and Yijun Yu. "Cross-Language Learning for Program Classification Using Bilateral Tree-Based Convolutional Neural Networks", In the proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) Workshop on NLP for Software Engineering, New Orleans, Louisiana, USA, 2018. Bi-TBCNN

Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi. "Learning to Represent Programs with Graphs", In: 6th International Conference on Language Representations (ICLR), 2018. GGNN

Y. Li, D. Tarlow, M. Brockschmidt, R. Zemel. "Gated graph sequence neural networks", In: 4th International Conference on Language Representations (ICLR), 2016.

Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin: "Convolutional Neural Networks over Tree Structures for Programming Language Processing". In: AAAI 2016: 1287-1293. TBCNN, datasets/pku_cpp_104/

Parsing

M. L. Collard and J. I. Maletic, "srcML 1.0: Explore, Analyze, and Manipulate Source Code," 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), Raleigh, NC, 2016, pp. 649-649. srcML

Parr, T. J. and Quong, R. W. 1995. "ANTLR: a predicated-LL(k) parser generator". Softw. Pract. Exper. 25, 7 (Jul. 1995), 789-810. ANTLR

Slicing

Hakam W. Alomari, Michael L. Collard, Jonathan I. Maletic, Nouh Alhindawi and Omar Meqdadi. “srcSlice: very efficient and scalable forward static slicing”. Software: Evolution and Process, 26(11):931-961, November 2014.

Diffing

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. "Fine-grained and accurate source code differencing". In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (ASE '14). ACM, New York, NY, USA, 313-324. GumTreeDiff

Yijun Yu, Thein Thun Tun, and Bashar Nuseibeh, "Specifying and detecting meaningful changes in programs," In: Proc. of the 26th IEEE/ACM Conference on Automated Software Engineering, pp. 273-282, 2011. MCT

Bug Localisation

Tezcan Dilshener, Michel Wermelinger, Yijun Yu: “Locating bugs without looking back”. Automated Software Engineering 25(3): 383-434 (2018) ConCodeSe

yijunyu/demo