paulfitz/daff

question: Is it possible to specify a tolerance value for floating point comparisons?

suyashb95 opened this issue · 5 comments

Is there a compare flag that can be used to specify a small tolerance value such that the diffing algorithm ignores changes where the delta is less than the tolerance?

Good question! There isn't. There was a discussion about this in #59. To summarize:

  1. Suppose daff is giving you diffs where rows are aligned correctly but some cells are shown as changed because of floating point issues. Fixing this is fairly easy.
  2. Suppose daff is giving you diffs where a row in the original table and a row in the current table are treated as different because of floating point issues. Fixing this is fairly hard.

What do you say if instead of tolerance there were quantization, meaning rounding to a certain number of decimal places? In that case, this would be a fairly easy fix. The difference is whether hashing can be used to find matches or you need to do an N-to-N comparison.

@paulfitz thanks for summarizing!

For some more context, I'm using the python bindings for daff as shown in the example

The problem I'm facing falls into the first category. There is a defined primary key and rows are aligned correctly but tiny differences in numbers are highlighted as changes (like 123.45 -> 123.46). I'm guessing in this case we can post process the diff somehow to remove these?

I've tried rounding the data to 2 decimal places before running daff and the diff is significantly cleaner but, there are a few cases where one value is rounded up and another is rounded down because of floating point precision differences.

Having quantization as a feature sounds good, I'm not sure if hashing is a good idea for numerical comparisons though. What do you think?

@suyash458 I added a daff --ignore-epsilon 0.1 flag for ignoring floating point differences up to a threshold (for non-primary-key comparisons). Hope this helps.

@paulfitz whew that was fast 🙂, thanks a lot for adding this feature! I'm guessing w.r.t the Python API it's equivalent to the snippet below?

flags = daff.CompareFlags()
flags.ignore_epsilon = 0.01

I will try it out let you know

Apologies for being super late on this but, it works as expected! Tested with v1.3.46