[Bug]: Boolean threshold percentage is not respected in local evaluation mode
SamuelCabralCruz opened this issue · 13 comments
Bug Description
Boolean threshold flag does not respect the provided percentage in local evaluation mode.
Version Info
/app $ /flipt -v
_________ __
/ ____/ (_)___ / /_
/ /_ / / / __ \/ __/
/ __/ / / / /_/ / /_
/_/ /_/_/ .___/\__/
/_/
Version: v1.38.2
Commit: 586a30ca4cf3613bf19646288806e42d7235bb27
Build Date: 2024-03-15T16:33:00Z
Go Version: go1.21.8
OS/Arch: linux/amd64
node -v
v20.10.0
Search
- I searched for other open and closed issues before opening this
Steps to Reproduce
flipt configuration
version: "1.2"
namespace: e2e
flags:
- key: prd0002-threshold-evaluation
name: threshold-evaluation
type: BOOLEAN_FLAG_TYPE
enabled: false
rollouts:
- threshold:
percentage: 50
value: true
test
import { afterAll, beforeAll, beforeEach, describe, expect, it } from 'vitest';
import { FliptEvaluationClient } from '@flipt-io/flipt-client';
import * as uuid from 'uuid';
describe('threshold', () => {
const flagKey = 'prd0002-threshold-evaluation';
describe(flagKey, () => {
const n = 10000;
const p = 0.025;
const expected: Record<string, number> = { 'true': 0.5, 'false': 0.5 };
const context = {};
let client: FliptEvaluationClient;
let observed: Record<string, number> = {};
beforeAll(() => {
client = new FliptEvaluationClient('e2e', { url: 'http://localhost:8080' });
});
afterAll(() => {
client.close();
});
beforeEach(async () => {
for (let i = 0; i < n; i++) {
const response = client.evaluateBoolean(flagKey, uuid.v4(), context);
const value = response.result!.enabled.toString();
observed[value] = (Object.keys(observed).includes(value) ? observed[value] : 0) + 1 / n;
}
});
it('should match distribution', () => {
expect(Object.keys(observed)).toEqual(expect.arrayContaining(Object.keys(expected)));
Object.keys(expected).forEach((k) => {
expect(Math.abs(observed[k] - expected[k])).toBeLessThan(p);
});
});
});
});
Expected Behavior
I would expect to have a 50/50 distribution on large number of evaluation.
Additional Context
No response
Hey @SamuelCabralCruz could you perhaps share what tolerance you're seeing? (i.e. the observed p
).
I see you've got 0.025 as a constant there for the max delta, just curious what the actual observed delta is.
Wondering if is it wildly out or if that tolerance just too rigid and low to be expected for N == 10000.
One observation I am having while taking a walk through both the Go and Rust implementations of our evaluators, is an off by one discrepancy between the two. In Go, we increment the searched for bucket by 1 before doing a binary search for the target, whereas in Rust we do not.
In Rust:
flipt-client-sdks/flipt-evaluation/src/lib.rs
Lines 414 to 417 in c870463
I wonder how this accumulates for large N in terms of the final distribution.
Im also wondering if this is to account for slight differences between Go's sort.SearchInts
and Rust's binary_search
.
Looking at the two, Go will return either the location of the found int or the insertion location if not found.
Whereas, Rust returns a Result
which disambiguates the two situations. In both cases however, the enclosed results points to either the match or the highest value lower than the search target (meaning the index before insertion).
I was wrong there, Rust returns the higher offset and matches Go.
TL;DR I do think there are some subtle, yet potentially accumulative differences here that we should rectify.
Im going illustrate a boundary scenario that I think would return different results between these two.
Given a distribution [50, 50]
, we would translate this into a range of [0, 1000]
like so [500, 1000]
in both implementations. When then take a crc of the entityID + flagKey and modulo that into the bucket range.
Imagine the result of this hash is right on the boundary. In this case 500
.
In Go, we would incrememned 500
to 501
before searching an array with two entries [500, 1000]
.
In this situation, Go returns 1
.
Whereas, in Rust we would ultimately do vec![500, 1000].binary_search(&500)
, which results in Ok(0)
.
So, we do have an off by one discrepancy here between the two, which might be what is throwing out your test.
But knowing that delta would be great.
Hey @SamuelCabralCruz could you perhaps share what tolerance you're seeing? (i.e. the observed
p
). I see you've got 0.025 as a constant there for the max delta, just curious what the actual observed delta is. Wondering if is it wildly out or if that tolerance just too rigid and low to be expected for N == 10000.
I reproduced this and got a delta of around 20
😬
Re: the off-by-one discrepancy, I think you did find a bug but in the variant_evaluaton
flow. The boolean_evaluation
doesn't use the same bucket search method
One thing I want to do is try to test the logic/algorithm for a large N to see if our assumptions are wrong in both the Go and Rust logic
Here it is in Go: https://github.com/flipt-io/flipt/blob/3f8e7bd62714cd7643a12ad9992d8eab56654d5a/internal/server/evaluation/evaluation.go#L183-L188
We could write a test that runs only this code in a loop to see if it gives a 50/50 distribution in Go, and then do the same for the Rust code:
flipt-client-sdks/flipt-evaluation/src/lib.rs
Lines 475 to 484 in c870463
If it's correct in Go and wrong in Rust then there's a problem with our translation (I'm a bit weary of the casting that we do and order of operations here:
flipt-client-sdks/flipt-evaluation/src/lib.rs
Lines 481 to 482 in c870463
If its wrong in Go too, then the overall algorithm is incorrect
Reproduced here: https://github.com/flipt-io/flipt-client-node-example/pull/1
I did a seemingly equivalent experiment in Go 1000 times over and averaged the p's to get around 0.004
on average:
package main
import (
"fmt"
"hash/crc32"
"math"
"github.com/google/uuid"
)
var n = 10000.0
func main() {
var ps []float64
for j := 0; j < 1000; j++ {
var counts [2]float64
for i := 0.0; i < n; i++ {
x := (int(crc32.ChecksumIEEE([]byte(uuid.New().String()+"foo"))) % 1000) < 500
idx := 0
if x {
idx = 1
}
counts[idx] = counts[idx] + (1.0 / n)
}
ps = append(ps, math.Abs(counts[0]-0.5))
}
var avg float64
for _, f := range ps {
avg += f
}
fmt.Println("Average P is", avg/float64(len(ps)))
}
go run main.go
Average P is 0.004024000000000027
Where I alter the range boundary by 1 to simulate the off by 1 we see in Rust, the result is almost identical. Definitely not 20.
This is not a perfect reproduction of our bucketing in Go either, but at least suggests a 0.025
as chosen by @SamuelCabralCruz is a respectible error margin that we should be within in our Rust implementation. Our Go one could still be wrong too.
Equivalent Rust produces similar results:
fn main() {
let n: i64 = 10000;
let mut counts = [0f64; 2];
for _ in 0..n {
let uuid = uuid::Uuid::new_v4();
let uuid_string = uuid.to_string();
let entity = format!("foo{}", uuid_string);
let hash = crc32fast::hash(entity.as_bytes()) % 100;
counts[(hash < 50) as usize] += 1.0 / n as f64;
}
println!("{} {}", (counts[0] - 0.5).abs(), (counts[1] - 0.5).abs());
}
➜ cargo run src/main.rs
Compiling disthash v0.1.0 (/Users/georgemac/github/georgemac/disthash)
Finished dev [unoptimized + debuginfo] target(s) in 0.31s
Runningtarget/debug/disthash src/main.rs
0.002999999999960923 0.0030000000000384164
So if you are seeing p ~ 20 then something seems quite wrong in our Rust evaluator.
@markphelps you were pointing at the smoking gun, when I change this code above like so:
fn main() {
let n: i64 = 10000;
let mut counts = [0f64; 2];
for _ in 0..n {
let uuid = uuid::Uuid::new_v4();
let uuid_string = uuid.to_string();
let entity = format!("foo{}", uuid_string);
- let hash = crc32fast::hash(entity.as_bytes()) % 100;
+ let hash = crc32fast::hash(entity.as_bytes()) as i32 % 100;
counts[(hash < 50) as usize] += 1.0 / n as f64;
}
println!("{} {}", (counts[0] - 0.5).abs(), (counts[1] - 0.5).abs());
}
➜ cargo run src/main.rs
Compiling disthash v0.1.0 (/Users/georgemac/github/georgemac/disthash)
Finished dev [unoptimized + debuginfo] target(s) in 0.34s
Running target/debug/disthash src/main.rs
0.25340000000001084 0.25339999999993335
Yep, the u32 is (often in fact) larger that max i32 and wraps around when cast.
GH autoclosed this, but this should be fixed in the next client SDK release 👍
Thanks for raising it @SamuelCabralCruz !
@SamuelCabralCruz I just released v0.3.3 (java) and v0.4.3 (node and other client SDKs) with the fix! thanks for reporting
and thanks @GeorgeMac for the fix!
@markphelps @GeorgeMac Thanks a lot for quick support 💯 ❤️ Really appreciated.