/shibboleth

Patch Correctness Assessment in Automated Program Repair Based on the Impact of Patches on Production and Test Codes

Primary LanguageJavaApache License 2.0Apache-2.0

Patch Correctness Assessment in Automated Program Repair Based on the Impact of Patches on Production and Test Codes

Shibboleth is a light-weight, effective patch correctness assessment technique for automated program repair (APR). Shibboleth can be used to assess the patches via both ranking and classification. The tool is based on the hints obtained from both production code and test code. Shibboleth can be used as part of APR pipeline during fix-report generation. The tool is based on light-weight measures, thereby imposing no sensible overhead on test execution. This repository contains the following material: (1) our software artifact (namely, Shibboleth) which can be used either as a Maven plugin or as a command-line tool; (2) Two pre-configured example buggy program upon which Shibboleth can be tested; (3) Our data set of patches.

Table of Contents

Introduction

Test-based APR systems often generate patches that pass the test suite (i.e., they are so-called plausible patches) yet do not fix the bug (i.e., they are incorrect). Generated plausible patches must be manually inspected, as it involves reasoning about semantic equivalence which is undecidable. Various approaches are proposed for automatic correctness assessment of the generated patches for reducing this manual effort.

In this work, we design and implement a novel technique, named Shibboleth, for automatic correctness assessment of the patches generated by test-based generate-and-validate APR systems. The technique is based on the idea that the buggy program is almost correct insofar as fixing bugs involves small changes to the code and does not remove the code implementing correct functionality of the program. Thus, we measure the impact of patches on both production code (via syntactic and semantic similarity) and test code (via code coverage of) to separate the patches that result in similar programs and that do not delete desired program elements. The technique assesses the correctness of patches via both ranking and classification.

We evaluated our technique on 1,290 patches, generated by 29 Java-based APR systems for Defects4J programs. The technique outperforms state-of-the-art raking techniques. Specifically, in 43% (66%) of the cases, it ranks the correct patch in top-1 (top-2) positions. Additionally, we have evaluated classification power of the tool on 1,871 human-written and APR-generated patches. The tool achieved an accuracy and F1-score of 0.887 and 0.852, respectively, thereby outperforming state-of-the-art techniques.

Shibboleth Setup

Shibboleth is a robust tool that can be used both as a Maven plugin or through its command-line interface.

Installation

This repository contains source code of Shibboleth, so you can directly clone the repository, compile it, and install the Maven plugin on your local machine. You can follow the instructions to clone, compile and install Shibboleth Maven plugin.

Step 0: Please read system requirements section and make sure you have at least Maven, Git, JDK 1.8+, Python 3.9, scikit-learn 1.0.2, pandas 1.3.5, numpy 1.21.5, and joblib 1.1.0 installed on your computer and the environment variable JAVA_HOME is pointing to the installation patch of your JDK home. For example, in my Linux machine, JDK 1.8 is installed under /opt/jdk1.8.0_251. So, I use the following command to set JAVA_HOME variable.

export JAVA_HOME=/opt/jdk1.8.0_251

After this, when I run the command mvn -ver, I can see something like the following

Apache Maven 3.6.0
Maven home: /usr/share/maven
Java version: 1.8.0_251, vendor: Oracle Corporation, runtime: /opt/jdk1.8.0_251/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-154-generic", arch: "amd64", family: "unix"

Step 1: Use the following command to clone the repository on your computer.

git clone https://github.com/ali-ghanbari/shibboleth.git

Step 2: Navigate to the project directory, compile, and install the project on your local repository by using the following commands.

cd shibboleth
mvn clean install

Downloading all the dependencies and making the JAR file might take up to a minute depending on the load of your computer and the speed of your Internet connection. After seeing (green) BUILD SUCCESS message on your screen, your Shibboleth Maven plugin is ready to use. You will also have the JAR file shibboleth-maven-plugin-1.0-SNAPSHOT.jar under target folder. You can use this JAR file to run Shibboleth as command-line application.

Maven Plugin

Once you install Shibboleth on your local Maven repository, you can use the tool by configuring the POM file of the target project and providing the needed information. This can be done by adding the following template XML snippet under the <plugins> tag in the pom.xml of the target project. Optional parts are shown in comment form, together with a short description about their default values.

<plugin>
    <groupId>edu.iastate</groupId>
    <artifactId>shibboleth-maven-plugin</artifactId>
    <version>1.0-SNAPSHOT</version>
    <!-- <configuration>  -->
    <!--     <targetClasses>${groupId}*, _i.e._, all application classes</targetClasses> -->
    <!--     <excludedClasses>all test cases, _i.e._, *Tests, *Test, *TestCase*</excludedClasses> -->
    <!--     <excludeTestClasses>true, _i.e._, exclude test classes during coverage analysis</excludeTestClasses> -->
    <!--     <includeProductionClasses>true, _i.e._, all classes under target/classes shall be included</includeProductionClasses> -->
    <!--     <targetTests>*Tests, *Test, *TestCase*, all classes that end with Tests or Test, or contain the word TestCase</targetTests> -->
    <!--     <excludedTests>test cases that we wish to ignore</excludedTests> -->
    <!--     <inputFile>input-file.csv, a CSV file containing some basic information about the patches</inputFile> -->
    <!--     <childJVMArgs>-Xmx16g, _i.e._, maximum 16 GB of heap space for child JVM processes for profiling, etc.</childJVMArgs> -->
    <!-- </configuration> -->
</plugin>

An important ingredient of setting up Shibboleth is the CSV file input-file.csv. This CSV file is expected to contain 4 columns, namely "Patch Id", "X", "Patched Methods," and "Patched Classes." The file should not have a header. "Patch Id" is a unique identifier for the patch. "X" is ignored; you may leave it empty. "Patched Methods" is the list of fully qualified method names patched by the patch. If more than one method is patched, the method names should be separated by using semicolons. Finally, the field "Patched Classes" is the semicolon-separated list of class file names for the patched classes. All other columns after "Patched Classes" shall be ignored.

Once you are done with the setup, Shibboleth ranker can be invoked from the command-line as follows (please make sure you compile test and production code before invoking the tool).

mvn edu.iastate:shibboleth-maven-plugin:rank

Similarly, Shibboleth classifier can be invoked using the command.

mvn edu.iastate:shibboleth-maven-plugin:classify

The tool shall print the result on the standard output above the green BUILD SUCCESS message.

Command-line Interface

Many Defects4J project are Ant project, and we cannot directly apply Shibboleth's Maven plugin on them. Therefore, we have included a generic command-line interface for Shibboleth that enables it to be applicable on all JVM-based projects (whether it is a Maven project, an Ant project, or a non-standard toy Java project). The command-line interface of Shibboleth can be accessed from the same JAR file that we use as a Maven plugin, i.e., target/shibboleth-maven-plugin-1.0-SNAPSHOT.jar once you installed the project. Once we include the JAR file in the classpath of a JVM session, i.e., using the command java -cp /path/to/shibboleth-maven-plugin-1.0-SNAPSHOT.jar, we can invoke Shibboleth by calling Ranker. If you include nothing as command-line arguments, or add the switch -h (or --help), the system will show the options and a short description of each option. The options correspond to the Maven plugin options described in the previous section. The command-line options are as follows.

  • -i or --inputCSVFile: The name of the file (in relative or absolute form) of the CSV file containing required information about patches. By default, the value input-file.csv is used.
  • -v or --childJVMArgs: A list of JVM arguments used when creating a child JVM process, e.g. during profiling. By defualt the option -Xmx16g is used.
  • -n or --includeProductionClasses: Whether or not include production classes. This option corresponds to <includeProductionClasses> of the Maven plugin, described in the previous section.
  • -c or --targetClasses: Target application classes to be transformed. This option corresponds to <targetClasses> of the Maven plugin, described in the previous section.
  • -s or --excludeTestClasses: Whether or not test classes should be excluded. This option corresponds to <excludeTestClasses> of the Maven plugin, described in the previous section.
  • -e or --excludedClasses: Target application classes to be excluded from transformation. This option corresponds to <excludedClasses> of the Maven plugin, described in the previous section.
  • -t or --targetTests: Target test classes to be included. This option corresponds to <targetTests> of the Maven plugin, described in the previous section.
  • -x or --excludedTests: Target test classes to be excluded. This option corresponds to <excludedTests> of the Maven plugin, described in the previous section.
  • -b or --buildFolder: Build folder for application classes, e.g., target/classes in Maven projects or build/classes for (most) Ant projects.
  • -u or --testBuildFolder: Build folder for test classes, e.g., target/test-classes in Maven projects or build/tests for (most) Ant projects.
  • -l or --classpath: Classpath for the target program.
  • -h or --help: Prints usage.

Example

In this section, we show how you can use Shibboleth Maven plugin by applying it on a Defects4J bug, namely Chart-21. Please follow these steps to apply Shibboleth on the bug Chart-21, which has 1 CORRECT patch (with id 1074) and 3 INCORRECT patches generated for it.

Step 0: Set JAVA_HOME to point to the location of JDK 1.8. Since on the computer on which I am using, the JDK resides under /opt/jdk1.8.0_251, we are using the following command to set the environment variable.

export JAVA_HOME=/opt/jdk1.8.0_251

This directory (or the exact version of JDK 1.8) might be different in your computer.

Step 1: To apply Shibboleth, you want to first build the project. To do so, you need to navigate to the project folder and use Maven build system to build the project. For that, you can use the following commands.

cd Example
cd Chart-21
mvn clean test -DskipTests

Step 2: Since the project is pre-configured, there is no need for you to configure POM file to add Shibboleth. This, you can directly invoke Shibboleth by using the following command.

mvn edu.iastate:shibboleth-maven-plugin:rank

The tool will output a list of patch ids as follows. Patch 1074 is ranked in first place and this patch is the correct fix for the bug.

1 1074
2 1075
3 1077
4 1076

Similarly, had we ran thee command mvn edu.iastate:shibboleth-maven-plugin:classify we would get the following output.

1077: INCORRECT
1076: INCORRECT
1075: INCORRECT
1074: CORRECT

You can apply Shibboleth on the other example (Time-24) in a similar way.

Data Set of Patches

The database is located under the directory Data Set. The CSV files info.csv, info-ext.csv under Database is the main table of the data set which contains all the information about our patch data set. All the correct patches are located under the directories Database/correct and Database/correct-ext and all incorrect patches are located under Database/incorrect.

The records in the tables info.csv, info-ext.csv contain the following information.

  • The Defects4J bug id targeted by the patch,
  • Package name, source file name, line number and line range of the patched location,
  • Ground-truth label of the patch (i.e., correct/incorrect),
  • Fully qualified names of the patched methods,
  • The Ochiai suspiciousness values of the patched methods,
  • The name of the generating APR tool (or N/A in case the patch is human-written),
  • Provenance information, i.e., which data set each patch is coming from and the original identifier of the data point in that data set.

Here is an example record from the table.

5a,Time,4,INCORRECT,./incorrect/Time-4/1624652313125.patch,Arja,"org.joda.time.field,ZeroIsMaxDateTimeField.java,108,7","org.joda.time.field,org.joda.time.field.ZeroIsMaxDateTimeField.getMinimumValue()",1,Ye(patch4-Time-4-Arja-plausible)

In this record 5a is the unique identifier of the patch in our database. The next two fields, i.e., Time, 4, indicate that this patch targets bug Time-7 from Defects4J bug database. The next field reports the ground-truth label of the patch, which is INCORRECT in this case. The next field contains the absolute patch to the diff file for patch. The field after that indicates that the patch is generated by the APR tool Arja. The next field reports that the line 108 through 115 (i.e., 108 + 7) of the source file ZeroIsMaxDateTimeField.java residing in the package org.joda.time.field is modified during the patching. If more than one source files are modified, they shall be separated using semicolons. The field after that reports the patched methods' fully qualified names. If more than one methods are modified, they shall be separated using semicolons. After the name of the patched methods, aggregated (average) Ochiai suspiciousness of the patched methods is stored. Last but not least, the last field reports that this patch is coming from the data set curated by Ye et al.

System Requirements

Please make sure that you have the following software installed on your computer in order to be able to run the software artifact shipped in this repository.

  • Ubuntu Linux or macOS.
  • Maven v3.2+ if you wish to use Shibboleth as a Maven plugin.
  • JDK 8u171, 8u231, or, 8u281 is required in order to be able to build and test Defects4J projects; one Defects4J bug, namely Lang-5, is compatible with JDK 7u80 only.
  • Python 3.9 and Python libraries joblib 1.1.0, numpy 1.21.5, pandas 1.3.5, and scikit-learn 1.0.2

To install Python libraries automatically, you can use the file requirements.txt.