duplicated values for a column
Opened this issue · 1 comments
Deleted user commented
Hi
Is there a way to verify a column (like email address) has duplicate values? especially when using chunk_size option?
Thanks
tilo commented
That would use quite a bit of memory to add this behavior.. CSV files can be huge.
Probably better to do this kind of analysis before or after processing.
Or set-up your processing so that "duplicates" are handled as versioned.
I'll consider adding a hook in 2.0