Accelerating Extreme Classification via Adaptive Feature Agglomeration
DEFARG is executed in two steps:
- defrag_clustering: This computes a grouping of features.
- defrag_agglomeration: This agglomerates the features based on groupings obtained from previous step.
Please refer to sample_run.py for more information on how to use DEFRAG.
Feature and label files should be formatted as expected by Parabel.
Following parameters can be tuned in DEFRAG
fr = param.feature_representation : Use feture repersentation X or XY, default 1 (X).
cml = param.cluster_maxleaf : Maximum number of features in a leaf node of DEFRAG tree, default 8.
cls = param.cluster_label_sample : Percentage of labels used for clustering, default 5.
cds = param.cluster_data_sample : Percentage of data points used for clustering, default 20.
avg = param.avg : Average out non-zero entries while agglomeration, default 0"<<endl;
The code is adapted and subsequently modified from the source code provided by the authors of Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising.