klout/brickhouse

union_vector_sum throws java.lang.IndexOutOfBoundsException

oconnelc opened this issue · 0 comments

The VectorUnionSumUDAF is consistently throwing an IndexOutOfBoundsException. The stack trace is:

Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.rangeCheck(ArrayList.java:657)
        at java.util.ArrayList.get(ArrayList.java:433)
        at brickhouse.udf.timeseries.VectorUnionSumUDAF$VectorArraySumUDAFEvaluator.addVector(VectorUnionSumUDAF.java:146)
        at brickhouse.udf.timeseries.VectorUnionSumUDAF$VectorArraySumUDAFEvaluator.iterate(VectorUnionSumUDAF.java:114)
        at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:192)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:638)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:813)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:719)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:787)

This is because the following segment of code is attempting to resize the myagg.sumArray

private void addVector(Object listObj, VectorArrayAggBuffer myagg, ListObjectInspector inputOI) {
            int listLen = inputOI.getListLength(listObj);
            if (listLen > myagg.sumArray.size())
                myagg.sumArray.ensureCapacity(listLen);

However the ensureCapacity does not actually resize the array. According to the stack overflow: https://stackoverflow.com/questions/7688151/java-arraylist-ensurecapacity-not-working
ensuring capacity changes the capacity, which is the size the list can reach before it next needs to copy values