Inefficient Write+Copy+Delete pattern when writing to S3.
apghml opened this issue · 1 comments
apghml commented
When tf.io
writes a file, it checks whether HasAtomicMove
is true and if so, simulates atomic writes by first writing to a temporary file, then renaming the file to the correct name. This is great on a local filesystem. But for S3, this is undesirable behavior for a few reasons:
- S3 writes are already atomic, so there is no need to simulate one.
- In the AWS S3 SDK, which
tf.io
uses, moving a file is implemented as a copy+delete, which increases the load on S3 compared to a direct write.
learning-to-play commented
Hi @yongtang , This issue has is preventing some users from running their jobs on GPU by causing additional load. Could you help?