isbadawi/badavi

Large file support

Opened this issue · 2 comments

i.e. don't read the whole file into memory.

Should do some experiments to figure out the best way to do a regex search on a large file without reading it all into memory.

One idea would be mmap + madvise(MADV_SEQUENTIAL)

With mmap, the memory is not resident until accessed, so just naively using regexec like we do now, matches that occur earlier would use less memory. But if you're at the beginning of the file and the match is near the end of the file, it could still load everything into memory. madvise(MADV_SEQUENTIAL) doesn't seem to have an affect on this (tested on macOS, the resident set size was the same).

libpcre supports partial matching and multi-segment matching: https://www.pcre.org/current/doc/html/pcre2partial.html#SEC4
This would allow us to search in chunks, and explicitly unmap chunks once we're done with them.

Side note while experimenting with this, it looks like vim does read the entire file into memory (tested with a 1GB file).