boostorg/iostreams

cannot create an unsigned char bzip2 compressor

Spongman opened this issue · 5 comments

https://godbolt.org/z/6Gq67Yd8x

	filtering_stream<output, uint8_t> out;
	out.push(basic_bzip2_compressor<std::allocator<uint8_t>>{});

This is due to bzip2_base::char_type being hard-coded:

class BOOST_IOSTREAMS_DECL bzip2_base {
public:
typedef char char_type;

I think it's hard-coded because much of the C compression API uses "char" so making it generic would cause lots of other issues.
Also this isn't just bzip2, isn't it? zlib is the same as far as I can tell.
This might be a kind of intentional design choice...

Sure, in the same way that any wrapper around a C API has hard-coded types. But surely this can be done in such a way that basic_bzip2_compressor<std::allocator<uint8_t>>::char_type returns the correct thing?

Note: I am not speaking with any particular authority and certainly not for the boost project.
Maybe I misunderstand, but I think it returns exactly the right thing: The bzip2 compressor always operates on "char" streams. Changing the allocator does not change the stream type.
Reading the code I suspect the easier way to make your example work would be to have some wrapper that just adds a couple of casts around the function calls. But that is rather ugly, and possibly not strictly safe to actually do in all cases/environments?
Certainly not for any type other than uint8_t, so not sure the result will be any less of a mess than just using a "char" filtering stream and do any casts necessary on the input and output side.

If that's the case what is the point of having a polymorphic basic_bzip2_compressor in the first place?

I mean, we could have all C++ API just take void * everywhere and expect the user to cast everywhere, but that makes for a pretty poor API. This holds in this case, too.

The polymorphism is to allow replacing the allocator, e.g. to allocate from a fixed memory pool on some embedded device. I.e. HOW the memory is allocated, not which TYPE.
And I don't disagree in principle that it would be nice if it could just accept any kind of type, but I think that's not actually easy to do, nor end up very useful in the end. It looks to me in general the only thing it is meant for when implemented is to support both char and wchar_t