Performance degradation with nested queries
michaelbraun opened this issue · 14 comments
As the level of depth in a query becomes high (9+), the time to index with a TermFilteredPresearcher begins to gets longer and longer, much worse than linearly.
I made a sample class to show the problem with tunable branching and depth. As the level of depth of a query increases, the number of times norm() is called for a given token keeps increasing, while the norm has not changed (so something like memoization would help). Sampling the application and uncommenting the norm.countAskedForMap piece shows that individual terms are being norm()'d multiple times which is causing the issue.
These are some of the times from a run of the application on my laptop:
Stats for levels of depth - 4----
LongSummaryStatistics{count=300, sum=527, min=1, average=1.756667, max=8}
-------------
Stats for levels of depth - 5----
LongSummaryStatistics{count=300, sum=333, min=1, average=1.110000, max=5}
-------------
Stats for levels of depth - 6----
LongSummaryStatistics{count=300, sum=315, min=1, average=1.050000, max=4}
-------------
Stats for levels of depth - 7----
LongSummaryStatistics{count=300, sum=306, min=1, average=1.020000, max=2}
-------------
Stats for levels of depth - 8----
LongSummaryStatistics{count=300, sum=399, min=1, average=1.330000, max=4}
-------------
Stats for levels of depth - 9----
LongSummaryStatistics{count=300, sum=1111, min=2, average=3.703333, max=11}
-------------
Stats for levels of depth - 10----
LongSummaryStatistics{count=300, sum=2191, min=6, average=7.303333, max=17}
-------------
Stats for levels of depth - 11----
LongSummaryStatistics{count=300, sum=5834, min=17, average=19.446667, max=28}
-------------
Stats for levels of depth - 12----
LongSummaryStatistics{count=300, sum=18010, min=52, average=60.033333, max=122}
-------------
Stats for levels of depth - 13----
LongSummaryStatistics{count=300, sum=56634, min=162, average=188.780000, max=475}
-------------
Stats for levels of depth - 14----
LongSummaryStatistics{count=300, sum=160073, min=500, average=533.576667, max=808}
-------------
Stats for levels of depth - 15----
LongSummaryStatistics{count=300, sum=509592, min=1572, average=1698.640000, max=2501}
Stopped it at this point since the trend is apparent.
TestNestQueryExtraction.java.zip
Github unfortunately doesn't like the .java extension... file attached.
Yeah, that doesn't look right! Can you add the test as a PR?
@romseygeek where do you want the file placed in the scope of the luwak project?
Just under luwak/src/test will be fine
@romseygeek there you go
Thanks for the PR.
So I think we can solve this by setting the weights on the QueryTree when it's initially built, which means that norm() and friends are only called once per leaf node - do you want to have a crack at that?
@romseygeek there is also another issue - the query trees themselves are not immutable with regard to their weights - if children are added or the tree is advanced, the weight will change. Internally for testing I extended TreeWeightor to memoize the querytree->weight mapping within select() since I did not have the ability to place it on the trees themselves. And note this would not work in the case of advancing, as I'll mention below. This looked like:
class OptimizedTreeWeightor extends TreeWeightor {
private final Map<QueryTree, Float> memoizeQueryTreeToFloat = new IdentityHashMap<>();
public OptimizedTreeWeightor(WeightPolicy weightPolicy, CombinePolicy combinePolicy) {
super(weightPolicy, combinePolicy);
}
@Override
public QueryTree select(Set<QueryTree> children) {
if (children.isEmpty())
throw new IllegalArgumentException("Cannot select child from empty list");
if (children.size() == 1) {
return children.iterator().next();
}
float selectedWeight = -1f;
QueryTree selected = null;
for (QueryTree child : children) {
float childWeight = memoizeQueryTreeToFloat.computeIfAbsent(child, (qt) -> qt.weight(this));
if (childWeight > selectedWeight) {
selectedWeight = childWeight;
selected = child;
}
}
return selected;
}
public void clear() {
memoizeQueryTreeToFloat.clear();
}
}
Are you suggesting instead that QueryTrees store their own selected child and weight? And I'd imagine that should be cleared in the case of advancing (if we were using MultipassTermFilteredPresearcher, where we ran into the same issue in highly nested queries). Also just confirming that QueryTrees are expected to only be used for a single TreeWeightor, as they're built using a single QueryAnalyzer which has a TreeWeightor - is this correct?
QueryTrees will indeed be built for just a single TreeWeightor.
What I think we can do is make TermNode's weight immutable. Conj and Disj then just take the minimum weight from their children, which is then propagated up the tree. And advancing the tree should still work in that case, because it just drops TermNodes which have been advanced over.
Made the suggested change and it boosts it for another few levels but the problem still occurs:
Stats for levels of depth - 4----
LongSummaryStatistics{count=300, sum=552, min=1, average=1.840000, max=14}
-------------
Stats for levels of depth - 5----
LongSummaryStatistics{count=300, sum=333, min=1, average=1.110000, max=3}
-------------
Stats for levels of depth - 6----
LongSummaryStatistics{count=300, sum=330, min=1, average=1.100000, max=5}
-------------
Stats for levels of depth - 7----
LongSummaryStatistics{count=300, sum=315, min=1, average=1.050000, max=5}
-------------
Stats for levels of depth - 8----
LongSummaryStatistics{count=300, sum=304, min=1, average=1.013333, max=4}
-------------
Stats for levels of depth - 9----
LongSummaryStatistics{count=300, sum=311, min=1, average=1.036667, max=2}
-------------
Stats for levels of depth - 10----
LongSummaryStatistics{count=300, sum=391, min=1, average=1.303333, max=3}
-------------
Stats for levels of depth - 11----
LongSummaryStatistics{count=300, sum=838, min=2, average=2.793333, max=5}
-------------
Stats for levels of depth - 12----
LongSummaryStatistics{count=300, sum=1952, min=5, average=6.506667, max=60}
-------------
Stats for levels of depth - 13----
LongSummaryStatistics{count=300, sum=4268, min=13, average=14.226667, max=18}
-------------
Stats for levels of depth - 14----
LongSummaryStatistics{count=300, sum=11463, min=34, average=38.210000, max=55}
-------------
Stats for levels of depth - 15----
LongSummaryStatistics{count=300, sum=32025, min=92, average=106.750000, max=275}
-------------
Stats for levels of depth - 16----
LongSummaryStatistics{count=300, sum=83471, min=257, average=278.236667, max=384}
Will try experimenting with a few other approaches.
Added caching to ConjunctionNode and DisjunctionNode as well and this yielded significant improvement. Broke apart the test query creation from the regular timing to get more accurate times.
Stats for levels of depth - 4----
LongSummaryStatistics{count=300, sum=451, min=1, average=1.503333, max=8}
-------------
Stats for levels of depth - 5----
LongSummaryStatistics{count=300, sum=307, min=1, average=1.023333, max=3}
-------------
Stats for levels of depth - 6----
LongSummaryStatistics{count=300, sum=366, min=1, average=1.220000, max=5}
-------------
Stats for levels of depth - 7----
LongSummaryStatistics{count=300, sum=333, min=0, average=1.110000, max=5}
-------------
Stats for levels of depth - 8----
LongSummaryStatistics{count=300, sum=300, min=1, average=1.000000, max=1}
-------------
Stats for levels of depth - 9----
LongSummaryStatistics{count=300, sum=304, min=1, average=1.013333, max=2}
-------------
Stats for levels of depth - 10----
LongSummaryStatistics{count=300, sum=304, min=1, average=1.013333, max=2}
-------------
Stats for levels of depth - 11----
LongSummaryStatistics{count=300, sum=371, min=1, average=1.236667, max=3}
-------------
Stats for levels of depth - 12----
LongSummaryStatistics{count=300, sum=675, min=2, average=2.250000, max=5}
-------------
Stats for levels of depth - 13----
LongSummaryStatistics{count=300, sum=1288, min=4, average=4.293333, max=6}
-------------
Stats for levels of depth - 14----
LongSummaryStatistics{count=300, sum=2611, min=8, average=8.703333, max=12}
-------------
Stats for levels of depth - 15----
LongSummaryStatistics{count=300, sum=5689, min=16, average=18.963333, max=53}
-------------
Stats for levels of depth - 16----
LongSummaryStatistics{count=300, sum=11664, min=34, average=38.880000, max=58}
-------------
Stats for levels of depth - 17----
LongSummaryStatistics{count=300, sum=24143, min=71, average=80.476667, max=118}
-------------
Stats for levels of depth - 18----
LongSummaryStatistics{count=300, sum=49139, min=149, average=163.796667, max=223}
-------------
Stats for levels of depth - 19----
LongSummaryStatistics{count=300, sum=101084, min=316, average=336.946667, max=448}
-------------
Verified as well with MultipassTermFilteredPresearcher which relies on advancing and made sure the caching taking into account advancing is correct. This looks much better as the expensive parts are no longer recomputed needlessly.
Let me know if I should take these approaches and make a (clean) PR with the caching of values on the nodes.
@romseygeek let me know if you prefer a new clean PR with a single commit with the performance changes.
I've opened a separate PR (#175) which I think should help with this, while also clearing out a whole bunch of cruft. Could you take a look?
@romseygeek looks like it's faster than the PR I had! Here are the numbers:
Current master:
Stats for levels of depth - 7----
LongSummaryStatistics{count=300, sum=362, min=0, average=1.206667, max=3}
-------------
Stats for levels of depth - 8----
LongSummaryStatistics{count=300, sum=376, min=1, average=1.253333, max=4}
-------------
Stats for levels of depth - 9----
LongSummaryStatistics{count=300, sum=589, min=1, average=1.963333, max=5}
-------------
Stats for levels of depth - 10----
LongSummaryStatistics{count=300, sum=1200, min=3, average=4.000000, max=9}
-------------
Stats for levels of depth - 11----
LongSummaryStatistics{count=300, sum=3290, min=8, average=10.966667, max=18}
-------------
Stats for levels of depth - 12----
LongSummaryStatistics{count=300, sum=9466, min=27, average=31.553333, max=47}
-------------
Stats for levels of depth - 13----
LongSummaryStatistics{count=300, sum=28460, min=80, average=94.866667, max=135}
-------------
Stats for levels of depth - 14----
LongSummaryStatistics{count=300, sum=81055, min=225, average=270.183333, max=428}
-------------
Stats for levels of depth - 15----
LongSummaryStatistics{count=300, sum=226110, min=691, average=753.700000, max=980}
My old PR (#158):
Stats for levels of depth - 7----
LongSummaryStatistics{count=300, sum=234, min=0, average=0.780000, max=3}
-------------
Stats for levels of depth - 8----
LongSummaryStatistics{count=300, sum=164, min=0, average=0.546667, max=1}
-------------
Stats for levels of depth - 9----
LongSummaryStatistics{count=300, sum=272, min=0, average=0.906667, max=2}
-------------
Stats for levels of depth - 10----
LongSummaryStatistics{count=300, sum=308, min=1, average=1.026667, max=2}
-------------
Stats for levels of depth - 11----
LongSummaryStatistics{count=300, sum=345, min=1, average=1.150000, max=3}
-------------
Stats for levels of depth - 12----
LongSummaryStatistics{count=300, sum=724, min=2, average=2.413333, max=5}
-------------
Stats for levels of depth - 13----
LongSummaryStatistics{count=300, sum=1193, min=3, average=3.976667, max=7}
-------------
Stats for levels of depth - 14----
LongSummaryStatistics{count=300, sum=2452, min=7, average=8.173333, max=14}
-------------
Stats for levels of depth - 15----
LongSummaryStatistics{count=300, sum=5141, min=14, average=17.136667, max=27}
-------------
Stats for levels of depth - 16----
LongSummaryStatistics{count=300, sum=11176, min=31, average=37.253333, max=74}
-------------
Stats for levels of depth - 17----
LongSummaryStatistics{count=300, sum=21782, min=64, average=72.606667, max=96}
-------------
Stats for levels of depth - 18----
LongSummaryStatistics{count=300, sum=46537, min=136, average=155.123333, max=240}
-------------
Stats for levels of depth - 19----
LongSummaryStatistics{count=300, sum=97623, min=286, average=325.410000, max=524}
-------------
Your New PR (#175)
-------------
Stats for levels of depth - 7----
LongSummaryStatistics{count=300, sum=218, min=0, average=0.726667, max=2}
-------------
Stats for levels of depth - 8----
LongSummaryStatistics{count=300, sum=163, min=0, average=0.543333, max=3}
-------------
Stats for levels of depth - 9----
LongSummaryStatistics{count=300, sum=224, min=0, average=0.746667, max=2}
-------------
Stats for levels of depth - 10----
LongSummaryStatistics{count=300, sum=311, min=1, average=1.036667, max=3}
-------------
Stats for levels of depth - 11----
LongSummaryStatistics{count=300, sum=353, min=1, average=1.176667, max=3}
-------------
Stats for levels of depth - 12----
LongSummaryStatistics{count=300, sum=642, min=2, average=2.140000, max=4}
-------------
Stats for levels of depth - 13----
LongSummaryStatistics{count=300, sum=1051, min=3, average=3.503333, max=6}
-------------
Stats for levels of depth - 14----
LongSummaryStatistics{count=300, sum=2120, min=6, average=7.066667, max=14}
-------------
Stats for levels of depth - 15----
LongSummaryStatistics{count=300, sum=4062, min=12, average=13.540000, max=19}
-------------
Stats for levels of depth - 16----
LongSummaryStatistics{count=300, sum=8420, min=25, average=28.066667, max=39}
-------------
Stats for levels of depth - 17----
LongSummaryStatistics{count=300, sum=19370, min=54, average=64.566667, max=146}
-------------
Stats for levels of depth - 18----
LongSummaryStatistics{count=300, sum=38716, min=116, average=129.053333, max=183}
-------------
Stats for levels of depth - 19----
LongSummaryStatistics{count=300, sum=83632, min=251, average=278.773333, max=490}
-------------
Fixed by #175