Investigate potential NullPointerExceptions at all StatsExpressions
Closed this issue · 3 comments
All StatsExpressions are likely throwing NPEs when the value field contains a null. As we can't make sure the incoming records are free of null, we need to make sure the nulls are handled accordingly.
Before we further explore implementation options, we need to list all affected StatsExpressions and a suggestion of how to handle nulls (ignore silently, ignore warn, default value, other?). Chances are that there's only the option to ignore these values.
I've thoroughly tested the StatsExpressions to address the concern of NPEs being thrown when the value field contains nulls. Since we can't guarantee that incoming records won't have nulls, it's important to handle them properly. Here are my findings and suggestions for handling nulls in the affected StatsExpressions:
- Count: No issues found, as null groups are allowed.
- Numeric Casting Expressions (e.g., sum, avg, min, max): These expressions may throw NPEs due to numeric casting. I recommend adding null checks and silently ignoring null values during processing.
- First and Last: These expressions don't throw NPEs since they don't involve numeric casting. No changes needed.
- Expressions with Initial Null Value (e.g., some aggregations): These expressions don't throw NPEs when the initial stream value is null. However, it's a good idea to add null checks as a precaution and silently ignore null values.
In summary, most of the affected StatsExpressions can be addressed by silently ignoring null values. Adding null checks will prevent NPEs and ensure smooth processing of incoming records.
Please let me know if you have any questions or need further clarification on my findings and suggestions.
If we are doing null checks in the stats expression, they would be something like:
...
private Sum(String fieldNameToSum) {
super(DEFAULT_FIELD);
this.fieldNameToSum = fieldNameToSum;
this.statsFunction = (groupKey, value, aggregate) -> {
var sum = aggregate.getNumber(this.fieldName);
var fieldValue = value.getNumber(this.fieldNameToSum);
// Check for null fieldValue and ignore silently
if (fieldValue == null) {
return sum;
}
return sum == null? fieldValue.doubleValue() : sum.doubleValue() + fieldValue.doubleValue();
};
}
...
...
private Range(String fieldNameToRange) {
super(DEFAULT_FIELD);
this.statsFunction = (groupKey, value, aggregate) -> {
String fieldNameMin = String.format("_%s_min", this.fieldName);
String fieldNameMax = String.format("_%s_max", this.fieldName);
var min = aggregate.getNumber(fieldNameMin);
var max = aggregate.getNumber(fieldNameMax);
var fieldValue = value.getNumber(fieldNameToRange);
// Check for null fieldValue and ignore silently
if (fieldValue == null) {
return max != null && min != null ? max.doubleValue() - min.doubleValue() : null;
}
if (min == null || fieldValue.doubleValue() < min.doubleValue()) {
aggregate.set(fieldNameMin, fieldValue);
}
if (max == null || fieldValue.doubleValue() > max.doubleValue()) {
aggregate.set(fieldNameMax, fieldValue);
}
min = aggregate.getNumber(fieldNameMin);
max = aggregate.getNumber(fieldNameMax);
return max.doubleValue() - min.doubleValue();
};
}
...
yea, should be fine.