newlogic/newlogic-g2p

Implement dedup managers

Opened this issue · 5 comments

  • Dedup by Phone numbers
  • Dedup by ID document

It seems not finished:

  • Does not duplicate individuals by ID or phone number; it does it only for group members
  • Dedup on phone numbers should be done on the normalized phone number by libphonenumbers, not the raw one.
  • Dedup of ID number should be done in context to the ID type. ie, If two persons have the same ID number, but for different ID types, this is not a duplicate.

Can you link to where the code is, Do you have tests?

I'll fix this now.

I updated this issue.

  • It now deduplicate individuals by Phone and ID
  • It now sanitized the phone_number for the checking of duplicates, for some reason I can't get the phone_sanitized field so the code now sanitized the phone_number by itself.
  • The ID numbers was now checked by its ID Type and ID Number. ie, two person, different ID Type but same ID Numbers is ignored
  • I also tried optimized some codes in the dedup to minimize the load.

You can see the code in this Commit