mirage/eqaf

why in C?

Closed this issue ยท 25 comments

is it for the constant time? for performance? wouldn't a pure OCaml implementation (see below) work in the same fashion?

let eqaf a b =
  let len = min (String.length a) (String.length b) in
  let res = ref 0 in
  let get_int idx s = int_of_char (String.get s idx) in
  for i = 0 to pred len do
    res := !res land ((get_int i a) lxor (get_int i b))
  done ;
  !res = 0

I'm waiting a long benchmark to check the constant time of the C implementation. However, from my tests, this ML function does not check the equality with a constant time - I'm not sure if it's about the GC noise or not but I will look the assembly code generated to understand why thea ML version of equal does not have a constant time execution.

I clearly want a OCaml version (for portability) but if I can not check the constant time execution on it, I prefer to use a C function. I will update this issue with my results ๐Ÿ‘ !

########## Random ##########
---------- eqaf ----------
min: 295.563355.
max: 451.895365.
mean: 331.880177.
median: 324.179808.
deviation: 5.453890.
deviation: 18.100379%.
---------- eqst ----------
min: 25.332532.
max: 402.526954.
mean: 32.466916.
median: 27.256772.
deviation: 6.122530.
deviation: 1.987797%.
---------- eqml ----------
min: 4115.224663.
max: 7396.156073.
mean: 4862.692974.
median: 4599.915675.
deviation: 27.496528.
deviation: 1337.071723%.
########## Equal ##########
---------- eqaf ----------
min: 311.025058.
max: 612.734348.
mean: 384.195579.
median: 337.415476.
deviation: 8.980862.
deviation: 34.504076%.
---------- eqst ----------
min: 387.414471.
max: 733.811892.
mean: 432.752349.
median: 415.002861.
deviation: 7.110980.
deviation: 30.772933%.
---------- eqml ----------
min: 4082.366589.
max: 9207.859649.
mean: 5259.984148.
median: 4733.766429.
deviation: 35.377266.
deviation: 1860.838571%.
########## Total ##########
---------- eqaf ----------
min: -235.383571.
max: 83.966068.
mean: -52.315402.
median: -16.787965.
deviation: 8.791549.
deviation: -4.599334%.
---------- eqst ----------
min: -706.617198.
max: -31.485017.
mean: -400.285433.
median: -386.676459.
deviation: 7.962065.
deviation: -31.870984%.
---------- eqml ----------
min: -4738.452043.
max: 3084.672993.
mean: -397.291174.
median: -108.030682.
deviation: 40.194553.
deviation: -159.689412%.

Some explanations needed.

  • eqaf is the current implementation.
  • eqst is the OCaml's standard library implementation.
  • eqml is the OCaml implementation (see the snippet of @hannesm).

We run 2 kinds of test:

  • random: 2 lists of random strings. One random element of these list is exactly the same to see behavior specially with eqst (in this case, and only in this case, time needed by eqst should be the biggest than all others entries)
  • equal: 2 lists and all entries are equal each other.

From these entries, expected outputs are:

  • eqaf for random entries and equal entries should be the same
  • eqst for random should be the faster but one entry (the only one equal element) should take a long time
  • eqst for equal should take the same time
  • eqml should have the same behavior of eqaf for random and equal

Then, we do a subtraction between equal entries and random entries and outputs expected are:

  • diff of eqaf should have a very low standard deviation first but the most important is median of entries should close to 0 - if we consider than we have a constant-time execution, time needed to execute eqaf on random inputs with time needed to execute eqaf on equal inputs should be equal.
  • diff of eqst should have a middle standard deviation and median of entries should be far from 0 - indeed time needed on equal inputs should be the same but should be far from time needed on random input.

Behavior on eqml is not clear ...

cfcs commented

@dinosaure So how do the runtime stats here compare to your expectations?

PS @hannesm your implementation fails to detect this inequality: ๐Ÿ‘ฏโ€โ™‚๏ธ ;) (you should fail and return false early if lengths differ)

utop # eqaf "ab" "abCDEF";;
- : bool = true

EDIT: For those reading this thread and lacking context, it sprung out of this issue: mirage/digestif#34

IMHO, eqaf has expected result. time needed to compare equal values or not is the same (a diff between series of times when we compare equal values and series of times when we compare different values is close to 0 - see Total/eqaf/mean = ~50).

eqst (so caml_string_equal) has expected result too. It is the faster implementation when it compares different values but took the same time than eqaf when values are equal (Equal/eqaf/mean = Equal/eqst/mean = ~400). A diff between them shows eqst is not a constant time equal function Total/eqst/mean = ~400.

Finally, the OCaml implementation eqml is ... incomprehensible.

And again, I don't like to put another C stub on projects for this trivial function but benchmarks don't help me to avoid this way. If someone can show me an OCaml implementation which is close to what eqaf has, I will happy to use it.

So if this solution is good for you @hannesm and @cfcs, I can prepare a better test (which keep the signal with Travis CI) and release it.

Good news, Travis CI is happy and standard derivation on eqaf has an expected result (between -10% and 10% - eqst exceeds this internal).

Another good new is about eqml which, only on Travis CI, has expected result (standard derivation is close to 0). So I'm able to externalize on the API eqml but I will keep eqaf and let the user to choose between them.

Finally, I will delete the support on 4.03.0 because Core ...

cfcs commented

I think this makes sense. I am still kind of confused about the deviance that the OCaml implementation displays, but hopefully we can work that out. In the meantime, I think it makes sense to go ahead with this C implementation.

I would be curious to see a comparison with these two other implementations:

let eqaf_native a b =
  let len = min (String.length a) (String.length b) in
  let res = ref 0n in
  let get_int idx s = Nativeint.of_int (int_of_char (String.unsafe_get s idx)) in
  for i = 0 to pred len do
    res := Nativeint.logor !res (Nativeint.logxor (get_int i a) (get_int i b))
  done ;
  (String.length a = String.length b) && !res = 0n

external unsafe_get64 : string -> int -> int64 = "%caml_string_get64u"
let eqaf_int64 a b =
  let len = min (String.length a) (String.length b) in
  let res = ref 0L in
  let get_int64 idx s = unsafe_get64 s idx in
  for i = 0 to pred (len / 8) do
    res := Int64.logor !res (Int64.logxor (get_int64 (8*i) a) (get_int64 (8*i) b))
  done;
  let get_int idx s = Int64.of_int (int_of_char (String.unsafe_get s idx)) in
  for i = len - len mod 8 to pred len do
    res := Int64.logor !res (Int64.logxor (get_int i a) (get_int i b))
  done;
  (String.length a = String.length b) && !res = 0L

Beware that if the Travis CI tests run in a VM, you might see different runtimes/behavior in a non virtualized computer.

@UnixJunkie indeed, outputs on this issue come from my server (and can be replicated easily on your computer). And, as I said, problem on Mac OSX about eqml should be about noise of virtualization.

@gasche I will prepare some benchmarks about that ๐Ÿ‘

Of course my first eqaf_int64 version was wrong (didn't handle the modulo-8 leftover at the end of the inputs etc.), I edited it. I'm still not fully sure that it is correct, but it looks better now.

cfcs commented

@gasche eqaf_native and eqaf_int64 can leak the length of the unknown string (which may or may not be an issue depending on the code), you should do the length comparison before running through the for loop.

EDIT: The point being that then you can exit early if the length differ.

Cannot the attacker also guess the length of the unknown string by calling the exit-early version many times with strings of each length, and see which exit early? I tried to avoid that, but of course you are right that running min s1 s2 iterations of the loop also leaks information. I guess that a safe formulation should be explicit into which argument is the secret and which is attacker-provided, and always run as many steps as the attacker-provided length.

(Apologies if I say something silly, I know nothing about timing attacks.)

cfcs commented

@gasche In theory, yes, but that depends on the application. If the attacker passes a long string they could (if they have a baseline for your string comparisons) provide one very long string and get the length of the compared-against string with only one invocation of eq. If it exits early, they at least need multiple chances to compare against the same sensitive value.

On the other hand this makes it easier to determine the length (since the diff between a 128 vs 127 string would be much easier to distinguish if you exit early than if the attacker has to time the diff between 127/128 invocations), so you have a point.

I would still prefer exiting early, also because the performance of the neq case would be impacted less, leading to less overhead compared to vanilla OCaml String.equal in that case.

So, some news about previous benchmarks. I rewrote the benchmark to avoid dependency with the world^Wcore_bench and I have good results now about eqml. You can see the new implementation on #4 which follows mostly what core_bench do but the biggest change is about time where we use now a monotonic timer.

Then, I integrated snippets of @gasche and results on my computer (X1 Carbon, virtualization) is:

########## Random ##########
---------- eqaf ----------
min:       0.001152.
max:       0.002241.
mean:      0.001906.
median:    0.001954.
deviation: 0.015425.
deviation: 0.000000%.
---------- eqst ----------
min:       0.002506.
max:       0.027956.
mean:      0.025222.
median:    0.025612.
deviation: 0.058866.
deviation: 0.000015%.
---------- eqml ----------
min:       0.000080.
max:       0.000162.
mean:      0.000133.
median:    0.000136.
deviation: 0.004615.
deviation: 0.000000%.
---------- eqnt ----------
min:       0.000057.
max:       0.000176.
mean:      0.000109.
median:    0.000097.
deviation: 0.005999.
deviation: 0.000000%.
---------- eqbg ----------
min:       0.000736.
max:       0.001512.
mean:      0.001254.
median:    0.001305.
deviation: 0.014299.
deviation: 0.000000%.
########## Equal ##########
---------- eqaf ----------
min:       0.001005.
max:       0.002599.
mean:      0.001691.
median:    0.001458.
deviation: 0.022845.
deviation: 0.000000%.
---------- eqst ----------
min:       0.000904.
max:       0.002425.
mean:      0.001603.
median:    0.001576.
deviation: 0.020863.
deviation: 0.000000%.
---------- eqml ----------
min:       0.000069.
max:       0.000161.
mean:      0.000129.
median:    0.000138.
deviation: 0.005324.
deviation: 0.000000%.
---------- eqnt ----------
min:       0.000096.
max:       0.000167.
mean:      0.000135.
median:    0.000138.
deviation: 0.004099.
deviation: 0.000000%.
---------- eqbg ----------
min:       0.000020.
max:       0.001221.
mean:      0.000913.
median:    0.000998.
deviation: 0.016835.
deviation: 0.000000%.
########## Total ##########
---------- eqaf ----------
min:       -0.000838.
max:       0.001058.
mean:      0.000215.
median:    0.000343.
deviation: 0.023630.
deviation: 0.000000%.
---------- eqst ----------
min:       0.001392.
max:       0.026118.
mean:      0.023619.
median:    0.024057.
deviation: 0.058371.
deviation: 0.000014%.
---------- eqml ----------
min:       -0.000051.
max:       0.000075.
mean:      0.000004.
median:    0.000003.
deviation: 0.005738.
deviation: 0.000000%.
---------- eqnt ----------
min:       -0.000092.
max:       0.000067.
mean:      -0.000026.
median:    -0.000023.
deviation: 0.006447.
deviation: -0.000000%.
---------- eqbg ----------
min:       -0.000410.
max:       0.001049.
mean:      0.000341.
median:    0.000333.
deviation: 0.018763.
deviation: 0.000000%.

Your results are a bit hard to interpret for me. Correct me if I'm wrong:

  • the deviation is lower for eqml and eqnt than for eqaf on Random (and all are sensibly lower than eqst of course), so the OCaml implementations are even more constant-time than the C version, except eqbg which is about the same?

  • assuming we can use mean to draw performance conclusion, the numbers suggest that the OCaml-based solutions are sensibly faster (when called from an OCaml program) than the C version, doesn't that suggest just getting rid of the C code?

  • in the Equal case there is essentially no performance difference between eqml (the naive version) and eqnt (using native machine integers), so this suggests using the simpler/clearer eqml version from @hannesm.

  • The OCaml constant-time functions also seem much faster than eqst (using String.compare) on equal strings, which is somewhat strange. Does this suggest that String.equal should be reimplemented as OCaml code rather than a C primitive?

  • I don't understand why the mean value is different between Random and Equal for the same function, aren't the inputs of the same length in both cases, so shouldn't we get the same mean values? Are you testing enough rounds?

  • The mean of eqbg is ten times smaller in the Equal case than in the Random case, isn't that wrong?

  • deviation(eqml) = 0,004615 < deviation(eqnt) = 0,005999 < deviation(eqbg) = 0,014299 < deviation(eqaf) = 0,015425 < deviation(eqst) = 0,058866

From these results, as you said, yes, the OCaml implementation (eqml) is more stable than the C implementation (eqaf). Then, we get expected results about eqst, this implementation about random inputs is not stable.

  • It's not a surprise than the OCaml implementation (eqml) is faster than the C implementation (eqaf). I think, we can get this kind of results (instead previous version of the benchmark) only because I decided to use a monotonic clock to benchmark them (which is less subject to noise). And from these results, I completely agree to use the OCaml implementation now.

  • Again, I completely agree with this. The code of eqml is more readable than eqnt.

  • Indeed eqml is more faster than eqst when we compare 2 same string (but physically different). I need to figure out why and see the ASM code produced of both, I think.

  • mean(eqml) = 0,000133 on Random inputs is sensibly equal to mean(eqml) = 0,000129 on Equal inputs. It's the same for eqnt (mean(eqnt) = 0,000109 and mean(eqnt) = 0,000135). But, you are right, mean eqaf and eqbg differ between Random and Equal inputs. Mean of eqst differs too but it's an expected result.

  • For your last point, I need to figure out why we can have this situation.

Currently, size of inputs is 4096 and we produce 50 string. For each string we run the equal function until 1 second. Then, we see how many runs we did. I think, I will run a bigger benchmark.

Yes, it's not surprising that the OCaml version is faster because it can avoid the (small but noticeable) FFI costs at the boundary of the call, saving the registers to bridge the different calling conventions etc. This also explains why String.equal (which I think is C code) is slower. This is a per-C-call cost so the difference should be less noticeable on large strings. What is the size of the strings you use to benchmark?

I will provide a test on Windows and then make a new release of it with the OCaml implementation and all will happy ๐Ÿ‘ !

cfcs commented

I guess with the FFI version it also becomes harder to filter out jitter from the OCaml runtime (since we are looking to be constant time in the sense that execution time does not depend on the data [apart from the length]; not in the sense that each execution should take a constant amount of time to execute).
We can extrapolate that if something does take a constant time to execute for all inputs of a given length, the execution time is also constant-time (independent of the data), but we cannot go the other way.
So the FFI version, while it might be correct, is harder to measure correctly.

thanks to all who contributed, since this repository now uses a purely OCaml-based equal I'll close this issue.