Make logging safe/fire-and-forget
Opened this issue · 0 comments
At present, the following happens during write:
- Read the file from S3
- Append the new content to the end of the old content
- Write the file to S3
This means that the write function currently can lose data: two writes happen near enough to each other that one writes while the other is trying to read/append means that one of the two writes will blow the other away.
Thoughts on getting around this:
Locking
There's no intrinsic file locking for S3, so this would be implemented via a sentinel file. The read/append/write cycle doesn't change, but it does get bookended by checking for a lock file, creating a lock file, and removing a lock file.
Pros:
- Relatively easy to implement
- A log file is still generally easy to read via either the AWS console, s3fs, what have you
Cons:
- Still not totally safe
- What is the behavior for .write() when there is a lock present?
- How do we handle stale locks?
Log == Collection of Events
Essentially, each time .write() is called, rather than reading and appending, it simply creates a new file ("#{Time.now.to_f}-#{machine id}-#{process id}" or what have you), treating the given path as a directory rather than a file.
Pros:
- LOADS safer than the current model
- Relatively easy to implement
Cons:
- Logs are no longer easy to read via standard tools (console, s3fs, so on)
- Basically necessitates the creation of a utility for reading the logs
- Are there number-of-file limitations in S3? If not a limitation, does the number of files present increase cost?
- While much safer than current, still not totally safe (two processes writing at exactly the same time will cause a collision)