coinse/Defects4J-multifault

Related approach/work

Opened this issue · 4 comments

jose commented

Dear Gabin, Juyeon, and Shin,

Thanks for sharing the dataset.

I would like to point out that others have successfully enhanced D4J with support to multi-faults. In particular, David Paterson (@djpaterson) developed, back in 2019, a d4j-combine command on top of the D4J framework which "combines faults in a particular project version". He also drafted a script to find multi-fault versions in the D4J dataset. Here's the set of D4J faults that can be combined in a multi-fault version, according to David.

Looking forward to attend your talk at SSBSE 2021.

--
Best,
Jose

@jose, thank you so much for highlighting the work of David - we simply were not aware of it. It does achieve the same goal as ours, so it's a shame that we did not know of it. Was it ever published somewhere, or used in a multiple fault localisation study? Would love to know.

We will try to compare our results to David's list and see if there is any major differences. Thanks again :)

jose commented

... we simply were not aware of it. It does achieve the same goal as ours, so it's a shame that we did not know of it.

There are far too many papers for us to keep track these days. :-)

Was it ever published somewhere, or used in a multiple fault localisation study?

Yes, it was used in "Using Controlled Numbers of Real Faults and Mutants to Empirically Evaluate Coverage-Based Test Case Prioritization" but never explained in detail though. David later provided further details on how to create versions with multiple faults in his PhD thesis, "Improvements to Test Case Prioritisation considering Efficiency and Effectiveness on Real Faults (section 3.2.2)".

We will try to compare our results to David's list and see if there is any major differences.

Yes, it would be great to know (or at least check) whether you got the same combination of multi-faults.

agb94 commented

@jose

Hi Jose 🙂

Yes, it was used in "Using Controlled Numbers of Real Faults and Mutants to Empirically Evaluate Coverage-Based Test Case Prioritization" but never explained in detail though. David later provided further details on how to create versions with multiple faults in his PhD thesis, "Improvements to Test Case Prioritisation considering Efficiency and Effectiveness on Real Faults (section 3.2.2)".

Thanks for letting us know about David's work.
I've executed the enhanced version and found that it successfully generates multi-fault versions!

Yes, it would be great to know (or at least check) whether you got the same combination of multi-faults.

The multi-fault combinations are different because we used a slightly different methodology to construct the dataset. While his technique combines multiple faulty code lines by applying multiple patches to the source code, our method does not alter the source code but transplants bug revealing test cases from other buggy versions to detect additional faults that already reside in each buggy version. More information can be found in our preprint -> https://arxiv.org/pdf/2108.04455.pdf

Thank you :)

jose commented

While his technique combines multiple faulty code lines by applying multiple patches to the source code, our method does not alter the source code but transplants bug revealing test cases from other buggy versions to detect additional faults that already reside in each buggy version.

Thank you for the clarification.

--
Best,
Jose