Proposal: Stream compatible RegExp implementation
jamestalmage opened this issue · 1 comments
I think it would be cool to take the parser and AST you have created, and generate a Node.js Stream compatible version.
API would go something like this:
var input = createInputStream('hello how are you');
var streamRegex = new StreamRegex('\\w+');
var arr = [];
input.pipe(streamRegex.match())
.on('data', function(chunk) {
arr.push(chunk.toString('utf8'));
});
console.log(arr);
// ['hello', 'how', 'are', 'you']
Goals / Ideas:
- Equivalents for
match
,test
,split
, andreplace
- Work with very large inputs (i.e. larger than available memory). This would be the key advantage of using a Stream based version over the default.
- no copying of buffer data, use Buffer.concat() and buf.slice()
- work with multiple encodings
- be fast
I've searched, but have not found anything that operates this way. I did find this, but it converts the buffers to strings, and concats them (violating 2
, and 3
above).
Obviously this would be a separate project from this one, but it could certainly share the parser and AST at a minimum (and likely more). I may try implementing myself, but it would be nice to have buy in / input from the contributors here, especially if I end up wanting to refactor some of the code here to facilitate reuse in my project (and help from experts on the problem domain would certainly be welcome).
I think it could be pretty powerful. Thoughts?
Obviously this would be a separate project from this one,
I agree - creating a stream based RegExp engine is out of the scope of this project. Therefore, I am going to close this issue.
Work with very large inputs (i.e. larger than available memory)
This sounds like a good idea at first, but note, that it is trivial to write a RegExp that might match the entire stream input like /.+TheEnd$/
. Therefore, designing a streaming based RegExp might require restricting the expressiveness of the RegExp language.