S3 read throughput slow down after hit prefix limit
shaowei-su opened this issue · 1 comments
shaowei-su commented
Environment:
tensorflow==2.8.0
tensorflow-io==0.25.0
S3 loading client: tf.data.TFRecordDataset.
Issue
By default, S3 has limit on the number of GET/HEAD operation up to 5,500 per second per partitioned prefix, once this limit is reached than the read operation will throw 503 errors. What we noticed is that if the client starts seeing 503 error, the entire data loading speed will drop indefinitely until the end of the data loading process even if the 503 errors are recovered.
Question
Does the s3 client has retry logic in the case of 503 errors? if not, would failed S3 GET/HEAD request block the entire loading thread defined in num_parallel_reads
field? Thanks