Iterate large directories efficiently with python.
python-getdents is a simple wrapper around Linux system call getdents64 (see man getdents for details). More details on approach.
- Verify that implementation works on platforms other than
x86_64.
pip install getdentspython3 -m venv env
. env/bin/activate
pip install -e .[test]pip install cibuildwheel
cibuildwheel --platform linux --output-dir wheelhouseulimit -v 33554432 && py.test tests/Or
ulimit -v 33554432 && ./setup.py testfrom getdents import getdents
for inode, type, name in getdents('/tmp', 32768):
print(name)import os
from getdents import *
fd = os.open('/tmp', O_GETDENTS)
for inode, type, name in getdents_raw(fd, 2**20):
print({
DT_BLK: 'blockdev',
DT_CHR: 'chardev ',
DT_DIR: 'dir ',
DT_FIFO: 'pipe ',
DT_LNK: 'symlink ',
DT_REG: 'file ',
DT_SOCK: 'socket ',
DT_UNKNOWN: 'unknown ',
}[type], {
True: 'd',
False: ' ',
}[inode == 0],
name,
)
os.close(fd)python-getdents [-h] [-b N] [-o NAME] PATH
| Option | Description |
|---|---|
-b N |
Buffer size (in bytes) to allocate when iterating over directory. Default is 32768, the same value used by glibc, you probably want to increase this value. Try starting with 16777216 (16 MiB). Best performance is achieved when buffer size rounds to size of the file system block. |
--buffer-size N |
|
-o NAME |
Output format:
|
--output-format NAME |
- 3 - Requested buffer is too large
- 4 -
PATHnot found. - 5 -
PATHis not a directory. - 6 - Not enough permissions to read contents of the
PATH.
python-getdents /path/to/large/dir
python -m getdents /path/to/large/dir
python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv