How to fix: No 'by label' reference outlier found, which is needed for weighting!
CheyenneForbes opened this issue · 3 comments
I'm trying to visualize a rtree but I am getting an error:
Task failed
de.lmu.ifi.dbs.elki.utilities.exceptions.AbortException: No 'by label' reference outlier found, which is needed for weighting!
at de.lmu.ifi.dbs.elki.application.greedyensemble.VisualizePairwiseGainMatrix.run(VisualizePairwiseGainMatrix.java:140)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI$2.doInBackground(MiniGUI.java:600)
at de.lmu.ifi.dbs.elki.gui.minigui.MiniGUI$2.doInBackground(MiniGUI.java:591)
at javax.swing.SwingWorker$1.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at javax.swing.SwingWorker.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
I tried adding a field called bylabel
"by label" outlier does not refer to an attribute name, but to a result type - this class requires a label-based reference to compute the gain as defined in the corresponding paper. The VisualizePairWiseGainMatrix class does not visualize an rtree; instead you want to use the default KDDCLIApplication and the NullAlgorithm if you just want to visualize your data (and index). The index visualization can then be enabled in the menus.
Neither integer
nor real
are proper arff types - see the arff format documentation of Weka. The id
must not be numeric, or you need to set up -arff.externalid
to match the id column - otherwise, it will be used as part of your data! With the parameter -arff.classlabel
you can select your outlier column as class label for evaluation.
An R-Tree with this page size does not make any sense! All your data will be in a single page, and you get 0 benefit, only overhead, from the index.
Not very easily. Page size is chosen in bytes, and internal nodes require about twice as much memory per entry, because they need to store bounding boxes. Hence you must expect leaf nodes to have almost double the capacity as internal nodes if you store point data.
In a realistic R-tree setting, you'll be controlling the page size in bytes (set to a value such as 8192 that corresponds to the size of a block on the harddisk), not the number of entries.