Make logging safe/fire-and-forget

Question

Make logging safe/fire-and-forget

Opened this issue 9 years ago · 0 comments

At present, the following happens during write:

Read the file from S3
Append the new content to the end of the old content
Write the file to S3

This means that the write function currently can lose data: two writes happen near enough to each other that one writes while the other is trying to read/append means that one of the two writes will blow the other away.

Thoughts on getting around this:

Locking

There's no intrinsic file locking for S3, so this would be implemented via a sentinel file. The read/append/write cycle doesn't change, but it does get bookended by checking for a lock file, creating a lock file, and removing a lock file.

Pros:

Relatively easy to implement
A log file is still generally easy to read via either the AWS console, s3fs, what have you

Cons:

Still not totally safe
What is the behavior for .write() when there is a lock present?
How do we handle stale locks?

Log == Collection of Events

Essentially, each time .write() is called, rather than reading and appending, it simply creates a new file ("#{Time.now.to_f}-#{machine id}-#{process id}" or what have you), treating the given path as a directory rather than a file.

Pros:

LOADS safer than the current model
Relatively easy to implement

Cons:

Logs are no longer easy to read via standard tools (console, s3fs, so on)
Basically necessitates the creation of a utility for reading the logs
Are there number-of-file limitations in S3? If not a limitation, does the number of files present increase cost?
While much safer than current, still not totally safe (two processes writing at exactly the same time will cause a collision)