Same functionality as accum?
dmitrig01 opened this issue · 10 comments
What's the difference between this project and https://github.com/jeffbski/accum? Similar issue filed there at jeffbski/accum#1
looks like the same thing, probably just a case of calling the same thing two different names
If it's the same thing, would it make sense to try and merge the projects?
They are very similar. However one of accum's original use cases was to make sure that if multibyte chars were split across packets in a stream, that they were properly concatenated back together before converting to string otherwise they can be corrupted.
Since I didn't have that use case in a test, the code regressed over time, but after realizing that today I added a test and fixed the code back so it handles this situation properly.
See multibyte char split test -jeffbski/accum@858c22c
and the fix - jeffbski/accum@33d2961
To fix concat-stream similarly it involves doing Buffer.concat before the toString('utf8') or similar rather than just converting the individual packets.
@dmitrig01 there are plenty of modules with similar functionality on NPM, IMO it isnt worth the effort to try and merge projects in this scenario. if you want to send me a PR to the concat-stream readme that mentions similar modules in an effort to avoid confusion then I would merge it.
@jeffbski at the moment concat-stream doesn't call .toString() for streams of buffers, it just does Buffer.concat so multibyte chars shouldnt get messed up
@maxogden I may have misread the concat-stream code but I thought that if one specifies
concat({encoding: 'string'}, cb)
That this would force a conversion to string from the buffer packets by iterating over the packets and calling toString on each one individually which causes the problem if a multi byte char happen to get split over two packets in the stream.
Please ignore me if I am crazy, that's just what it looked like from my quick look at the code.
Ps. I plan to add concat-stream to my readme under related projects so others can find it. It is nice that you have made it browser compatible.
All the best,
Jeff
@jeffbski oh yea good point, so:
- if encoding is buffer, and data are buffers, it uses Buffer.concat() and returns a buffer
- if encoding is string, and data are buffers, it will convert buffers to strings before concat, returns a string
however, the latter case, while possible, isn't recommended since you can just do the first case instead and then to buffer.toString(). string encoding is really intended for when your incoming chunks are strings. but I think i can probably fix multibyte char preservation for this edge case... cheers!
Yes, it is one of those edge cases that would be tricky to understand after it happened, so if it can be made foolproof, all the better. I had hit this on a project and it had me confused for a bit.
Excellent!
Hahahahah now the two are even more similar! I'll submit a PR :-)