ess/s3_log

Make logging safe/fire-and-forget

Opened this issue · 0 comments

ess commented

At present, the following happens during write:

  • Read the file from S3
  • Append the new content to the end of the old content
  • Write the file to S3

This means that the write function currently can lose data: two writes happen near enough to each other that one writes while the other is trying to read/append means that one of the two writes will blow the other away.

Thoughts on getting around this:

Locking

There's no intrinsic file locking for S3, so this would be implemented via a sentinel file. The read/append/write cycle doesn't change, but it does get bookended by checking for a lock file, creating a lock file, and removing a lock file.

Pros:

  • Relatively easy to implement
  • A log file is still generally easy to read via either the AWS console, s3fs, what have you

Cons:

  • Still not totally safe
  • What is the behavior for .write() when there is a lock present?
  • How do we handle stale locks?

Log == Collection of Events

Essentially, each time .write() is called, rather than reading and appending, it simply creates a new file ("#{Time.now.to_f}-#{machine id}-#{process id}" or what have you), treating the given path as a directory rather than a file.

Pros:

  • LOADS safer than the current model
  • Relatively easy to implement

Cons:

  • Logs are no longer easy to read via standard tools (console, s3fs, so on)
  • Basically necessitates the creation of a utility for reading the logs
  • Are there number-of-file limitations in S3? If not a limitation, does the number of files present increase cost?
  • While much safer than current, still not totally safe (two processes writing at exactly the same time will cause a collision)