arkhi-digital/silverstripe-cloudflare

Recursive glob is slow and can be improved greatly

Closed this issue · 1 comments

If we implode the array we should be able to search for all extensions at once like below

Instead of this:

foreach ($extensions as $ext) {
    $files = array_merge($this->rglob(rtrim($rootDir, "/") . "/*.{$ext}"), $files);
}

It should be

// remove the extension separator as it could be added by extension method by accident
foreach ($extensions as &$ext) {
    $ext = ltrim($ext, '.');
}
$extensions = implode(',', $extensions);
$files = array_merge($this->rglob(rtrim($rootDir, "/") . '/*.{'.$extensions.'}'), $files);

And update rglob() to use GLOB_BRACE

http://stackoverflow.com/a/23969253/2266583 with GLOB_BRACE

For large file systems this would speed up the process exponentially, as currently it iterates the entire file system for each extension so I'm going to put this down as a bug

This needs to be resolved in 3.x compatibility and 4.x compatibility

An upvoted comment from php.net/glob

Don't use glob() if you try to list files in a directory where very much files are stored (>100.000). You get an "Allowed memory size of XYZ bytes exhausted ..." error.
You may try to increase the memory_limit variable in php.ini. Mine has 128MB set and the script will still reach this limit while glob()ing over 500.000 files.

The more stable way is to use readdir() on very large numbers of files:

if ($handle = opendir($path)) {
    while (false !== ($file = readdir($handle))) {
        // do something with the file
        // note that '.' and '..' is returned even
    }
    closedir($handle);
}

Maybe recursive glob with GLOB_ONLYDIR then iterate each $path with the above