PacketReader and PacketWriter use arrays instead of objects
Closed this issue · 12 comments
PacketWriter use byte[] to return result to its delegate, but sometimes u need
to parse incoming xml/json data.
I propose to replace the bytearrays to objects, in this case the result of the
PacketReader is XML object.
for example
---------------------------------------------------------------------------
public class XMLPacketReader implements PacketReader
{
XMLObject xml = null;
private XMLPacketReader()
{
}
public Object nextPacket(ByteBuffer byteBuffer) throws ProtocolViolationException
{
// parse xml here
boolean isDone = XMLParser.parse(byteBuffer, xml);
if(isDone)
return xml;
return null;
}
}
---------------------------------------------------------------------------
Original issue reported on code.google.com by abuhar...@gmail.com
on 16 Jan 2013 at 11:14
Don't use the packet reader in this manner. Instead, use a raw packet reader to
return a byte[], and assemble this XML in the SocketObserver's
packetReceived(NIOSocket socket, byte[] packet). The packet readers are
supposed to be very low level factories that simply group the byte stream into
packets.
Original comment by christof...@gmail.com
on 16 Jan 2013 at 11:21
- Changed state: Invalid
Well, it's double work: first from the buffer into a byte array, then the array
of bytes in XML
or I am wrong?
Original comment by abuhar...@gmail.com
on 16 Jan 2013 at 11:27
Implementation-wise, you only need to implement the XML-conversion in the
listener, since the RawPacketReader is default.
If you're worried about the extra copy of bytes, then usually you'll still need
to take the following steps:
bytebuffer -> byte[] -> string
If it's the XML parser that does bytebuffer -> byte[] or if it's done by the
RawPacketReader should not be important.
Sure, it's possible for an XML reader to -as an optimization - reuse a byte
array for the conversion to string, but the same amount of copying will still
be necessary. So this is 1 memory allocation we potentially could save.
Compared to the XML parsing, this is a very small cost.
In many cases it is more optimal to hand over heavy parsing to a different
thread and leave the IO thread to do the next read. This is quite easy if you
get raw bytes to the socket observer. It can then dispatch these bytes for
parsing in the same or to a different thread. There is no such flexibility
available for the packet reader.
Similarly it might be tempting to do decryption / encryption in the packet
reader. Again this is not recommended for the same reasons.
Keep the readers / writers simple.
Original comment by christof...@gmail.com
on 16 Jan 2013 at 11:42
Thank you for your response.
I wrote simple event-driven parsers for xml and json. Its reads byte by byte
and, when when tag(for xml) and object(for json) reached parser send
notification to it delegate. It works like SAX or maybe XMLStreamReader, but
for byte array as source.
I really worried about the extra copy of bytes in this case.
I think, that the best way - it minimize buffer size for PacketReader.
Original comment by abuhar...@gmail.com
on 16 Jan 2013 at 11:52
I understand. What you can do is a simple 1 byte raw reader, which extracts 1
byte from the byte buffer at the time. Since you immediately use this byte, you
can basically do:
- in the constructor, create a byte array of length 1
- read one byte from the byte buffer
- put that in the byte array
- return the byte array
- feed the single byte in the byte array to the parser in the SocketObserver
callback.
Original comment by christof...@gmail.com
on 16 Jan 2013 at 12:04
I understand, but with byte buffers I can use CharsetDecoders from java.nio
package. ByteBuffers only available inside PacketReader. That is the problem.
Without that I must read chars from byte arrays in SocketObserver with my own
mechanisms or use multiple wrappers and extra copy of bytes to use
CharsetDecoders in SocketObserver.
Original comment by abuhar...@gmail.com
on 16 Jan 2013 at 12:13
I believe that if you feel you'll encounter performance issues from the byte
copy, then the XML parse would be way too slow in itself to be allowed on the
IO thread - which again would suggest that the you should simply copy the bytes.
In addition, doing this change would make it less obvious how the reader should
be used. I would not want people to think that in general packets should parsed
in the reader.
Since the code is free to use without attribution, why not copy it and use a
custom implementation that works the way you want it?
Original comment by christof...@gmail.com
on 16 Jan 2013 at 12:45
I understand all this and I am not worried about performance issues related to
copying the byte arrays, it's just not right and not rational.
I can do everything myself, but I respect your opinion and it is important to
me.
You developed this great library is a very simple and functional, so I could
not consult with you.
Original comment by abuhar...@gmail.com
on 16 Jan 2013 at 12:51
Consider the two possible implementations:
a) SocketObserver receives an Object, which may be a byte[] or something else.
In this case, we have designed for the eventuality that we want to return
something other than a byte[]. As I've mentioned, this is unlikely to be
desirable in most cases, due to scalability concerns.
But in order to offer this flexibility, we now ask every practical
implementation of SocketObserver to first do:
byte[] bytePacket = (byte[]) packet;
It might also send the wrong signal, causing people to deserialize data in the
Reader, which it's unsuitable for (not only for performance concerns, but in
order to propagate error handling for invalid data, this reader might need to
propagate things like xml parse errors to the rest of the system).
On the other hand, if the reader only handles the lowest level of the packet
reading - as it is intended to -, any error detected will be in the basic
transfer protocol, and not have anything to do with data validation and actual
parsing so all practical error handling can take place at the same level.
b) SocketObserver uses generics.
In this case, we don't have to cast to byte[], but lots of the issues in (a)
still remains. In addition, The code uses a sentinel value in form of an empty
byte[]. That solution would be problematic in this case.
-
Think of the Readers as simple settings. I could have made some predefined ones
and have it look like this.
socket.setReadFormat(ASCII_LINE_FORMAT);
By making them objects I allow for them to be easily extended and even have
some internal state, but that shouldn't be misunderstood to mean that they are
suitable for any heavy lifting. Again, this is a threading issue, and just not
in the performance sense.
For instance, let's say a malicious user sends you "<xml><a><a><a>…" what
happens when it runs out of memory? Hopefully it doesn't scramble the whole
thread running IO, but it's hard to make guarantees. On the other hand, if you
did the parsing on a thread, any problem will first take out that thread and
the rest of the server might survive.
Yet another concern is how and when the reader is accessed. There is no
guarantee that it is safe to close a connection while the reader is running, or
manipulate anything else related to NIO. You don't know if it will run many
times in a single run, or just once every pass etc. This is the old framework
problem, where you must understand how and when the classes you plug into the
framework will be used.
For SocketObserver it's simple. Whenever you get a callback you're safe to do
anything to the sockets or the NIOService. With the Reader, there are no
guarantees.
I know it might seem pigheaded not to allow Object, but I feel this version is
much more obvious in how it should work. It is easy for the way it's supposed
to work (the Reader chopping data into byte[]), and hard the way it's not
supposed to work (the Reader deserializes the data).
If I would create a library where you plug in a packet deserializers, the
design would necessarily have been different.
As it is, Naga serves two purposes:
1) As example code on how to handle the usual ByteBuffer/SelectionKey juggling
one is always forced to do.
2) To be used as a building block for customized asynchronous socket
communication
Original comment by christof...@gmail.com
on 16 Jan 2013 at 4:07
I understand, thank you for such a detailed response.
Original comment by abuhar...@gmail.com
on 18 Jan 2013 at 7:36
I would like to thank you for pressing the issue, because it is very worthwhile
to revisit design decisions.
I realize it would be nice to get hold of the original bytebuffer in some
cases. Especially when you look at implementing things like SSL.
However, it turns out that in those cases the general design of Naga is a bad
fit.
For Naga, the following assumptions are made:
1. You don't want to think about how to properly prepare your bytebuffer before
every read.
2. You don't want to manually control the selection keys on the selector.
If those are invalid, then obviously the library will be less than ideal.
I personally tried hard to create an SSL implementation using java's SSL engine
- which violates both (1) and (2) - but I was so intensely dissatisfied with it
that I ended up deciding not to use it.
The more flexible you want to be the more difficult the library is to use.
I think the reasonable solution (as an alternative to increased flexibility) is
to create a different library that deliberately avoids making these
assumptions. By exposing some complexity, we might be able to simplify other
parts of the system.
For instance, if we entirely delegate the bytebuffer transfer to the user, then
the SocketObserver method could look something like: dataReceived(ByteBuffer)
instead of packetReceived(byte[]). We can then remove the "Reader" classes
entirely. This would be trading increased complexity in reading (you need to
handle converting the stream to bytes yourself, know how to flip the byte
buffer correctly etc) for simplified configuration (no need to figure out what
reader to use)
Original comment by christof...@gmail.com
on 18 Jan 2013 at 10:37
I am happy, if a little could be helpful to you.
>> For Naga, the following assumptions are made:
>> 1. You don't want to think about how to properly prepare your bytebuffer
before every read.
>> 2. You don't want to manually control the selection keys on the selector.
This simplicity is great.
Library, like the simplicity of naga must be in java.nio.
>> The more flexible you want to be the more difficult the library is to use.
I agree.
I do not like java for the fact that many things are very low-level and contain
a lot of required settings. It is difficult and some errors occur later.
On the other hand, a reasonable compromise can always be found.
>> For instance, if we entirely delegate the bytebuffer transfer to the user,
then the SocketObserver method could look something like: dataReceived
(ByteBuffer) instead of packetReceived (byte []). We can then remove the
"Reader" classes entirely. This would be trading increased complexity in
reading (you need to handle converting the stream to bytes yourself, know how
to flip the byte buffer correctly etc) for simplified configuration (no need to
figure out what reader to use)
I often use the naga, and in all cases I had to use the received Packet
byteArray as ByteBuffer to read byte by byte and parse xml and json.
Original comment by abuhar...@gmail.com
on 18 Jan 2013 at 1:08