How to write to a stream without copying unnecessarily?
MendelMonteiro opened this issue · 7 comments
A normal use case is to write into a stream (file/network/etc...), with the current API we would write once into a char[]
and then again into a byte*
and one last time into the stream.
Could this be improved by writing directly into a byte[]
and and then exposing a method which writes the byte buffer directly into a given stream?
This would obviously add the encoding as constructor parameter (though Encoding.Default
could be used when none is specified).
You can already allocate a byte[] and then reinterpret the pointer to pass to CopyTo(). Once you have the byte[] you can write it to anything, including any Stream. It's probably worth having another CopyTo() overload that just does this behind the scenes. Assuming the buffer is small enough you can stackalloc the byte array and not have to pay any allocations at all (not counting whatever the underlying Stream will do of course).
I was trying to think of ways to avoid at least one of these copies so I thought that StringBuffer
could use a byte[]
instead of a char[]
for it's storage and encode each char directly into the byte[]
as it formats it.
Ah, I see. That's not too hard. I've already thought about that just to make it easier to have utf8 support.
I'm going to have a bash at doing it and will submit a PR. I'll create a parallel implementation of StringBuffer because otherwise your performance benchmarks are sure to suffer as the encoding will be included.
So I think I still want to do this, but I don't want to use the Encoding class. The problem is that it's super heavy weight and designed for the case of encoding a ton of text all at once, not piecemeal during a formatting operation.
Probably what I'll do is just have my own specialized encoding methods for UTF8 and UTF16, since those are the ones people are most likely to use.
Doesn't that go against the philosophy of supporting the BCL features either completely or not at all?
I agree with the approach and would probably start with ASCII as people using the library are performance driven and it will probably produce the fastest encoding.
I think the ASCII encoding and UTF8 encoding will essentially be the same codepath; you don't pay the extra cost for UTF8 unless you have actual characters that need it.
It does go against that philosophy, which is unfortunate, but the #1 driving goal of the project is performance first; if you want 100% BCL compatibility, people can use the BCL. The Encoding classes are simply too heavyweight to be used here.