aggregate reports - why the complicated pipeline?

Question

aggregate reports - why the complicated pipeline?

Keeper-of-the-Keys opened this issue a year ago · 1 comments

Hey,
I hope I am not reopening something that has been discussed ad nauseam already, but I didn't see any discussion in the bug tracker here.

What is the reason that the pipeline for generating aggregate reports is so long?
By long I mean:

OpenDMARC writes a HistoryFile
opendmarc-importstats imports said history file into a db
opendmarc-reports generates a report based on the db and send it

Superficially it would seem that OpenDMARC could also write directly to the DB instead of a file, I assume people smarter than me have thought about this a lot and came to the conclusion that the above pipeline is better and I would like to understand those reasons.

The reasons that I could think about are that writing to a file is "easier"/"cheaper in compute" and less prone to lockup/failure than writing to a db and that importstats may be very intensive for larger setups so you may not want to run that on the same machine.

Answer 1 · 2024-02-19T15:28:56.000Z

Here we have 4 different machines, and one to import all files… using a unique DB would add a SPOF, using a DB cluster would add complexity (to an already-not-that-simple setup)… And adding IO/locks per mail seems a bad idea (there are many mails per second sometime…)