Show more diverse set of rows in dropped.txt
Deep1998 opened this issue · 0 comments
Deep1998 commented
Currently, we keep appending bad rows to conv till we hit the byte limit and then dump them to dropped.txt. When dealing with large tables, usually we end up storing all rows from one table in the dropped.txt because a single issue is occuring across many rows.
There is scope for improvement by adding bad rows from different tables by removing some of the earlier ones, as more rows caused by the same error does not provide more information. It is more efficient to report a few samples of multiple types of bad rows.