trusteddomainproject/OpenDMARC

aggregate reports - why the complicated pipeline?

Keeper-of-the-Keys opened this issue · 1 comments

Hey,
I hope I am not reopening something that has been discussed ad nauseam already, but I didn't see any discussion in the bug tracker here.

What is the reason that the pipeline for generating aggregate reports is so long?
By long I mean:

  1. OpenDMARC writes a HistoryFile
  2. opendmarc-importstats imports said history file into a db
  3. opendmarc-reports generates a report based on the db and send it

Superficially it would seem that OpenDMARC could also write directly to the DB instead of a file, I assume people smarter than me have thought about this a lot and came to the conclusion that the above pipeline is better and I would like to understand those reasons.

The reasons that I could think about are that writing to a file is "easier"/"cheaper in compute" and less prone to lockup/failure than writing to a db and that importstats may be very intensive for larger setups so you may not want to run that on the same machine.

dgeo commented

Here we have 4 different machines, and one to import all files… using a unique DB would add a SPOF, using a DB cluster would add complexity (to an already-not-that-simple setup)… And adding IO/locks per mail seems a bad idea (there are many mails per second sometime…)