
Too high cardinality is not suitable for dictionary!

Yancey1989 opened this issue · 3 comments

Hi !
With building a cube faild, it throws some error.

[QuartzScheduler_Worker-22]:[2015-01-08 00:21:38,468][INFO][com.kylinolap.dict.DictionaryGenerator.buildDictionaryFromValueList(] - Dictionary cardinality 9999956
[QuartzScheduler_Worker-22]:[2015-01-08 00:21:38,468][ERROR][] - Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??
java.lang.IllegalArgumentException: Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??
        at com.kylinolap.dict.DictionaryGenerator.buildDictionaryFromValueList(
        at com.kylinolap.dict.DictionaryGenerator.buildDictionary(
        at com.kylinolap.dict.DictionaryManager.buildDictionary(
        at com.kylinolap.cube.CubeManager.buildDictionary(

in source code

 * @author yangli9
@SuppressWarnings({ "rawtypes", "unchecked" })
public class DictionaryGenerator {

    private static final Logger logger = LoggerFactory.getLogger(DictionaryGenerator.class);

    private static final String[] DATE_PATTERNS = new String[] { "yyyy-MM-dd" };

    public static Dictionary<?> buildDictionaryFromValueList(DictionaryInfo info, List<byte[]> values) {
        // log a few samples
        StringBuilder buf = new StringBuilder();
        for (Object s : samples) {
            if (buf.length() > 0)
                buf.append(", ");
        }"Dictionary value samples: " + buf.toString());"Dictionary cardinality " + info.getCardinality());

        if (values.size() > 1000000)
            throw new IllegalArgumentException("Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??");

        return dict;

Here is limit to 1000000, what is it means?

dictionary resides in memory. if a column has a quite large cardinality, it means the generated dictionary will occupy a lot memory, which does not make a lot sense. For such columns, you might consider avoid using dictionary encoding


  1. Does "a quite large cardinality" means a column has a large distinct count number?
  2. How can i "avoid using dictionary encoding" ? Can i do it with create cube?

2. when you created the cube, check "advanced setttings" tab, set "use dictionary" false for the dimension