datafold/data-diff

PostgreSQL and MySQL interpret string ordering differently

mrn3 opened this issue · 5 comments

mrn3 commented

Describe the bug
Data diff results are not correct when comparing PostgreSQL and MySQL. This is because of the difference in the sorting order of hypens.

alphanum is being used to convert a string to number with that alphanum as base, so if we have hyphen - in the alphanum it is used to generate a set of strings based on range and that range will change across databases as the hyphen is treated differently by the databases.

This is also very related to #513 and may be a duplicate.

Make sure to include the following (minus sensitive information):

  • This issue is caused when issuing data-diff comparison of a PostgreSQL and MySQL database table.

Describe the environment

Debian Linux with data-diff 0.9.7.

mrn3 commented

I am working on #763 to resolve this issue.

mrn3 commented

This fixes the issue, but isn't ideal yet because it breaks other cases. We plan to make it so either a user will choose which kinds of alphanums they want and pass it in as a parameter, or it will automatically detect it based on the type of database.

dlawin commented

Data diff results are not correct when comparing PostgreSQL and MySQL. This is because of the difference in the sorting order of hypens.

@mrn3, I wonder if we can set the collation to something equivalent?

Only tangentially related, but I was considering setting mysql collation in this PR: #712

https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html

mrn3 commented

Yeah, we considered that - so you are thinking that maybe just setting the collation will resolve our original issue?

I do know that the fix I have that PR for does fix the issue.

Hi @mrn3,

I'm sorry for the delay in following up on this. Thank you for opening this issue and looking into the potential solutions!

We made a hard decision to sunset the data-diff package and won't provide further development or support.

Over the past few months, we have completely rewritten the diffing engine in the cloud and have solved the fundamental issues with the original algorithm used in the data-diff package, as well as collation support, etc. Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.

-Gleb