Too high cardinality is not suitable for dictionary!
Yancey1989 opened this issue · 3 comments
Yancey1989 commented
Hi !
With building a cube faild, it throws some error.
[QuartzScheduler_Worker-22]:[2015-01-08 00:21:38,468][INFO][com.kylinolap.dict.DictionaryGenerator.buildDictionaryFromValueList(DictionaryGenerator.java:72)] - Dictionary cardinality 9999956
[QuartzScheduler_Worker-22]:[2015-01-08 00:21:38,468][ERROR][com.kylinolap.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:55)] - Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??
java.lang.IllegalArgumentException: Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??
at com.kylinolap.dict.DictionaryGenerator.buildDictionaryFromValueList(DictionaryGenerator.java:75)
at com.kylinolap.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:110)
at com.kylinolap.dict.DictionaryManager.buildDictionary(DictionaryManager.java:166)
at com.kylinolap.cube.CubeManager.buildDictionary(CubeManager.java:171)
in source code
/**
* @author yangli9
*/
@SuppressWarnings({ "rawtypes", "unchecked" })
public class DictionaryGenerator {
private static final Logger logger = LoggerFactory.getLogger(DictionaryGenerator.class);
private static final String[] DATE_PATTERNS = new String[] { "yyyy-MM-dd" };
public static Dictionary<?> buildDictionaryFromValueList(DictionaryInfo info, List<byte[]> values) {
info.setCardinality(values.size());
...
// log a few samples
StringBuilder buf = new StringBuilder();
for (Object s : samples) {
if (buf.length() > 0)
buf.append(", ");
buf.append(s.toString()).append("=>").append(dict.getIdFromValue(s));
}
logger.info("Dictionary value samples: " + buf.toString());
logger.info("Dictionary cardinality " + info.getCardinality());
if (values.size() > 1000000)
throw new IllegalArgumentException("Too high cardinality is not suitable for dictionary! Are the values stable enough for incremental load??");
return dict;
...
Here is limit to 1000000, what is it means?
binmahone commented
dictionary resides in memory. if a column has a quite large cardinality, it means the generated dictionary will occupy a lot memory, which does not make a lot sense. For such columns, you might consider avoid using dictionary encoding
Yancey1989 commented
- Does "a quite large cardinality" means a column has a large distinct count number?
- How can i "avoid using dictionary encoding" ? Can i do it with create cube?
binmahone commented
1.yes
2. when you created the cube, check "advanced setttings" tab, set "use dictionary" false for the dimension