tilo/smarter_csv

duplicated values for a column

Opened this issue · 1 comments

Hi

Is there a way to verify a column (like email address) has duplicate values? especially when using chunk_size option?

Thanks

tilo commented

That would use quite a bit of memory to add this behavior.. CSV files can be huge.

Probably better to do this kind of analysis before or after processing.
Or set-up your processing so that "duplicates" are handled as versioned.

I'll consider adding a hook in 2.0