Code for ECMLPKDD 2019 Paper: A Framework for Deep Constrained Clustering - Algorithms and Advances
git clone https://github.com/blueocean92/deep_constrained_clustering
cd deep_constrained_clustering
Python: see requirement.txt
for complete list of used packages. We recommend doing a clean installation of requirements using virtualenv:
conda create -n testenv python=3.6
source activate testenv
pip install -r requirements.txt
If you dont want to do the above clean installation via virtualenv, you could also directly install the requirements through:
pip install -r requirements.txt --no-index
PyTorch: Note that you need PyTorch. We used Version 1.0.0 If you use the above virtualenv, PyTorch will be automatically installed therein.
While in deep_constrained_clustering
folder:
sh download_model.sh
Step 2: Download Processed Reuters Data(optional, MNIST and Fashion is available in torchvision.datasets)
sh download_data.sh
cd experiments/
While in deep_constrained_clustering/experiments
folder:
To run the pairwise constrained clustering using pre-trained weights (AE features, 6000 constraints), do:
python run_DCC_pairwise.py --data $DATA
For the --data
flag which specifies the data set being used, the options are "MNIST", "Fashion" and "Reuters".
To run the pairwise without constrained clustering from raw features, do:
python run_DCC_pairwise.py --data $DATA --without_pretrain
To run the pairwise without KMeans initialization, do:
python run_DCC_pairwise.py --data $DATA --without_kmeans
To run the pairwise constrained clustering with noisy pairwise constraints do:
python run_DCC_pairwise.py --data $DATA --noisy $NOISE
For the --noisy
flag which specifies the noisy degree, the option should be a positive float equal to the ratio of noisy constraints to ground truth constraints.
To save data for plotting, do:
python run_DCC_pairwise.py --data $DATA --plotting
This will save the experiment data for plotting in folders under ./plotting
To plot the results, do:
python ./plotting/plot_pairwise.py
To run the instance difficulty constrained clustering, do:
python run_DCC_instance.py --data $DATA
To run the triplets constrained clustering (6000 constraints), do:
python run_DCC_triplets.py --data $DATA
To run the global size constrained clustering, do:
python run_DCC_global.py --data $DATA
To run the baseline Improved DEC, do:
python run_improved_DEC.py --data $DATA