DiffMatchPatch can’t handle null terminators
Mr0grog opened this issue · 0 comments
Mr0grog commented
This is from Sentry: https://sentry.io/environmental-data-governance-/diffing-server/issues/755613537/events/35761652755/
It turns out we have a reasonable amount of malformed content with null bytes in the middle of it. Unfortunately, our super-fast C-implementation can’t handle that (not too big of a surprise, really). It throws a ValueError
of differs.compute_dmp_diff()
:
web-monitoring-processing/web_monitoring/differs.py
Lines 72 to 79 in eab7e95
We should probably check for null terminators and replace them with something:
- The unicode replacement character?
�
(what we use for decoding errors) - The unicode null symbol?
␀
(fun, but too cute/too indecipherable for many users?)