FastDD-Exp

Datasets

Provided in datasets.zip, includes 14 datasets mentioned in our paper.

We have implemented IE-Hybrid with our best efforts, and the code is available in IE.zip.

Additionally, we provide implementations for BF and TD-PO, packaged in BF-TDPO.jar. You can customize the arguments to use different methods.

You can follow the instructions here to run IE-Hybrid, BF and TD-PO.

We use Domino's thresholds to run FastDD and IE-Hybrid, threshold files are provided here.

You can modify the third parameter to use the threshold in the corresponding file. For example:

java -jar FastDD.jar ./FastDD-Exp/dataset/restaurant.csv -1 ./FastDD-Exp/Exp-2/thresholds/restaurant.txt

It will use all the data in restaurant.csv as input and select the data in restaurant.txt as the threshold for each attribute.

Our program outputs relevant information including the number of DDs and running time. We provide an output example here.

Datasets varying |r| and |R| are provided in datasets.zip

We use Encoding method in default.

Other two diff-set construction methods are implemented in:

You can modify the number of threads in Config.java, Setting the value of ThreadSize can control the number of threads used .

Three datasets' column description and its top20 DDs are provided in Exp-7.

The restaurant dataset is available for download in datasets.zip. To obtain an unlabeled version of the dataset, simply remove the "class" column.

You can get our Top20 DDs in Exp-7

Datasets with noises, along with the corresponding indexes of violating tuples (starting at 0), are available in Exp-9