bitfield/script

Lines longer than 64k cause file truncation when Filtering

hobti01 opened this issue · 6 comments

Hi - first of all, thanks so much for this library! The API is really great for people coming from the command line.

We've observed that long lines (>64k) stop processing of FilterLine (and other functions using bufio.Scanner) and files get truncated. It seems to me that this is a result of the default buffer value (64k) for the bufio.Scanner e.g. https://github.com/bitfield/script/blob/v0.21.4/script.go#L505

While the bufio.Scanner allows increasing the buffer value https://pkg.go.dev/bufio#Scanner.Buffer, unfortunately this is not possible with Pipe.

What do you think of being able to specify a larger buffer size when creating the Pipe?

Thanks, @hobti01! Great bug report.

I wonder if there's any good reason not to simply always set the max buffer size to math.MaxInt. That doesn't actually allocate any memory; it's merely an upper bound on how big Scan can automatically grow the buffer on demand. And you wouldn't be able to manually set it to a higher value anyway, since maxTokenSize is an int. What do you think?

I like your pragmatic approach to keeping the interface clean. Seeing as there is no negative side effect to increasing the maxTokenSize this sounds like the best approach.

Do you have the time to make this change? If not I can get things setup on my side to submit a PR.

I'd be interested to see if the suggested solution fixes your use case—if so, I'll happily implement it. Could you try it out and let me know?

Sure let me see what I can do!

How about #169 ?

Great, thank you! I'll get to work on this.