/tassal

Tree-based Autofolding Software Summarization Algorithm

Primary LanguageJavaBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

TASSAL: Tree-based Autofolding Software Summarization ALgorithm Build Status

TASSAL is a tool for the automatic summarization of source code using autofolding. Autofolding automatically creates a summary of a source code file by folding non-essential code and comment blocks.

NEW: For a live demo of TASSAL that allows you to summarize any GitHub project see:
https://code-summarizer.herokuapp.com

This is an implementation of the code summarizer from our paper:
Autofolding for Source Code Summarization
J. Fowkes, R. Ranca, M. Allamanis, M. Lapata and C. Sutton. arXiv preprint 1403.4503, 2015.

There are two main variants of the algorithm:

  • TASSAL VSM which uses a Vector Space Model for source code - less accurate but very fast (real-time)
  • TASSAL which uses a Topic Model for source code - more accurate but slower (requires training)

both are described below.

Installation

Installing in Eclipse

Simply import as a maven project into Eclipse using the File -> Import... menu option (note that this requires m2eclipse).

It's also possible to export a runnable jar from Eclipse using the File -> Export... menu option.

Compiling a Runnable Jar

To compile a standalone runnable jar, simply run

mvn package

in the main tassal directory (note that this requires maven).

This will create the standalone runnable jar tassal-1.1-SNAPSHOT.jar in the tassal/target subdirectory.

Running TASSAL VSM

TASAAL VSM uses a Vector Space Model of source code tokens to determine which are the least relevent code regions to autofold. TASSAL VSM can run in real-time.

Autofolding a source file

codesum.lm.tui.FoldSourceFileVSM folds a specified source file. It has the following command line options:

  • -f   souce file to autofold
  • -c   desired compression ratio for the file (%)
  • -o   (optional) where to save the folded file

See the individual file javadocs in codesum.lm.tui for information on the Java interface. In Eclipse you can set command line arguments for the TASSAL interface using the Run Configurations... menu option.

Example Usage

A complete example using the command line interface on a runnable jar.

First clone the ActionBarSherlock project into /tmp/java_projects/

$ mkdir /tmp/java_projects/
$ cd /tmp/java_projects/
$ git clone https://github.com/JakeWharton/ActionBarSherlock.git 

We can then fold a specific file

$ java -cp tassal/target/tassal-1.1-SNAPSHOT.jar codesum.lm.tui.FoldSourceFileVSM     
 -c 50
 -f /tmp/java_projects/ActionBarSherlock/actionbarsherlock/src/com/actionbarsherlock/app/SherlockFragment.java 
 -o /tmp/SherlockFragmentFolded.java 

which will output the folded file to /tmp/SherlockFragmentFolded.java.

Running TASSAL

TASAAL uses a scoped Topic Model of source code tokens to determine which are the least relevent code regions to autofold. TASSAL requires the topic model to be trained on a dataset (the larger the better) before it can fold files in the dataset. While this is slower than using a VSM model, it is considerably more accurate.

Training the source code topic model

codesum.lm.tui.TrainTopicModel trains the underlying topic model. It has the following command line options:

  • -d   directory containing java projects
  • -w   working directory where the topic model creates necessary files
  • -i   (optional) no. iterations to train the topic model for.

This will output a summary of the top 25 tokens in some of the discovered topics.

Autofolding a source file

codesum.lm.tui.FoldSourceFile folds a specified source file. It has the following command line options:

  • -w   working directory where the topic model creates necessary files (same as above)
  • -f   souce file to autofold
  • -p   project containing the file to fold
  • -c   desired compression ratio for the file (%)
  • -b   (optional) background topic to back off to (0-2, default=2)
  • -o   (optional) where to save the folded file

See the individual file javadocs in codesum.lm.tui for information on the Java interface. In Eclipse you can set command line arguments for the TASSAL interface using the Run Configurations... menu option.

Example Usage

A complete example using the command line interface on a runnable jar.

First clone the ActionBarSherlock project into /tmp/java_projects/

$ mkdir /tmp/java_projects/
$ cd /tmp/java_projects/
$ git clone https://github.com/JakeWharton/ActionBarSherlock.git

Now you can train the topic model on the java projects in /tmp/java_projects/

$ java -cp tassal/target/tassal-1.1-SNAPSHOT.jar codesum.lm.tui.TrainTopicModel   
 -d /tmp/java_projects/  -w /tmp/ -i 100 

This trains the topic model for 100 iterations and outputs the model to /tmp/. We can then fold a specific file

$ java -cp tassal/target/tassal-1.1-SNAPSHOT.jar codesum.lm.tui.FoldSourceFile     
 -w /tmp/  -c 50 -p ActionBarSherlock 
 -f /tmp/java_projects/ActionBarSherlock/actionbarsherlock/src/com/actionbarsherlock/app/SherlockFragment.java 
 -o /tmp/SherlockFragmentFolded.java 

which will output the folded file to /tmp/SherlockFragmentFolded.java.

Summarizing a Project

TASSAL is also able to summmarize an entire project by finding the top source files (and therefore classes) representative of that project. These top project files can then be autofolded using TASSAL as above (see Autofolding a source file).

codesum.lm.tui.ListSalientFiles lists the most representative source files for a given project. It has the following command line options:

  • -s   working directory where the topic model creates necessary files (same as above)
  • -d   directory containing java projects
  • -p   project to summarize
  • -c   desired compression ratio (% of project files to list)
  • -b   (optional) background topic to back off to (0-2, default=2)
  • -o   (optional) where to save the salient files
  • -i   (optional) whether to ignore unit test files (default=true)

Note that this requires the topic model to first be trained on the given project (see Training the source code topic model above).

Bugs

Please report any bugs using GitHub's issue tracker.

License

This tool is released under the new BSD license.