GoogleCloudStorageFileSystem#delete recursive does not page
mswintermeyer opened this issue · 0 comments
mswintermeyer commented
GoogleCloudStorageFileSystem#delete assumes that the list of files it is deleting can be stored in memory. Rather than delete one page at a time when deleting a very large directory recursively, it loads them all into a List:
listFileInfoForPrefixPage
method that it could use instead, and just call that iteratively until all files are deleted.
As a contrast, S3's deletion code uses an iterator to delete directories recursively: https://github.com/apache/hadoop/blob/4bd873b816dbd889f410428d6e618586d4ff1780/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/DeleteOperation.java#L244-L246.