johnlpage/MongoSyphon

Duplicate key errors are silently swallowed and drop subsequent records

Closed this issue · 2 comments

From MongoBulkWriter lines 90 and following:

catch (com.mongodb.MongoBulkWriteException err) {
  //  Duplicate inserts are not an error if retrying
  for (BulkWriteError bwerror : err.getWriteErrors()) {
    if (bwerror
        .getCategory() != ErrorCategory.DUPLICATE_KEY) {
      logger.error(bwerror.getMessage());
      fatalerror = true;
      break;
    }
  }
  if (!fatalerror) {
  }
}

This means that any duplicate key errors are swallowed without being even logged. This can cause hard to track bugs in ETL processes.

This is made a lot worse as bulk writes are performed in ordered mode - so if there is a duplicate key error, MongoSyphon ignores the error and skips all subsequent documents in the current batch - leading the used to believe the process has been successful when it actually inserted only a small portion of documents have been inserted.

That definately needs a fix of some kind. Even if just logging.

Changed to load ordered and to warn in log of duplicates.