My own implementation of Google's Protocol Buffers.
encoding
module becameprotobuf
module.- Performance tests.
Bool.dump
2.2 times faster.Varint
14% faster.add_field
chaining.__hash__
17% faster.
- README techniques added.
- Hashes of message types.
- Fixed: loading of missing required field doesn't raise
ValueError
. - Message
load
doesn't useStringIO
for reading embedded messages and packed repeated fields anymore. TypeMetadata
! (read below)- Removed
MarshalableCode
(it's notprotobuf
's business). - Fixed: reading of
Int32
values raisesTypeError: 'str' object is not callable
- Fixed
Int32
type name (wasInt32Type
). - Added validation of message type.
- Unicode type.
- Python code object type.
- Fixed casting values to bool and from bool.
Fow now, there is full protobuf encoding implementation, so you can use the encoding
module with full compatibility with the standard implementation.
The encoding
module is covered with tests, but you should understand that there are may be some unknown bugs. Thus, use this software at your own risk.
Do from encoding import *
and you're ready to go.
Note: all names of message types are similar to described there. ;-)
Assume you have the following definition:
message Test2 {
string b = 2;
}
First, you should create the message type:
Test2 = MessageType()
Test2.add_field(2, 'b', String)
Then, create a message and fill it with the appropriate data:
msg = Test2()
msg.b = 'testing'
You can dump this now!
print msg.dumps() # This will dump into a string.
msg.dump(open('/tmp/message', 'wb')) # And this will dump into any write-like object.
You also can load this message with:
msg = Test2.load(open('/tmp/message', 'rb'))
or with:
msg = load(open('/tmp/message', 'rb'), Test2)
Simple enough. :)
To add a missing field you should pass an additional flags
parameter to add_field
like this:
Test2 = MessageType()
Test2.add_field(2, 'b', String, flags=Flags.REQUIRED)
If you'll not fill a required field, then ValueError will be raised during serialization.
Do like this:
Test2 = MessageType()
Test2.add_field(1, 'b', UVarint, flags=Flags.REPEATED)
msg = Test2()
msg.b = (1, 2, 3)
A value of repeated field can be any iterable object. The loaded value will always be list
.
Test4 = MessageType()
Test4.add_field(4, 'd', UVarint, flags=Flags.PACKED_REPEATED)
msg = Test4()
msg.d = (3, 270, 86942)
Consider the following definitions:
message Test1 {
int32 a = 1;
}
and
message Test3 {
required Test1 c = 3;
}
To create an embedded field, pass EmbeddedMessage as the type of field and fill it like this:
# Create the type.
Test1 = MessageType()
Test1.add_field(1, 'a', UVarint)
Test3 = MessageType()
Test3.add_field(3, 'c', EmbeddedMessage(Test1))
# Fill the message.
msg = Test3()
msg.c = Test1()
msg.c.a = 150
There are the following data types supported for now:
UVarint # Unsigned integer.
Varint # Signed integer.
Bool # Boolean.
Fixed64 # 8-byte string.
UInt64 # C++'s 64-bit `unsigned long long`
Int64 # C++'s 64-bit `long long`
Float64 # C++'s `double`.
Fixed32 # 4-byte string.
UInt32 # C++'s 32-bit `unsigned int`.
Int32 # C++'s 32-bit `int`.
Float32 # C++'s `float`.
Bytes # Pure bytes string.
Unicode # Unicode string.
TypeMetadata # Type that describes another type.
The Protocol Buffer format is not self delimiting. But you can wrap you message type in EmbeddedMessage
class and write/read it sequentially.
The other option is to use protobuf.EofWrapper
that has a limit
parameter in its constructor. The EofWrapper
raises EOFError
when the specified number of bytes is read.
There is no any description of the message type in a message itself. Therefore, if you want to send a self-described messages, you should send the a description of the message too.
I've implemented a tool for this... Look:
A, B, C = MessageType(), MessageType(), MessageType()
A.add_field(1, 'a', UVarint)
A.add_field(2, 'b', TypeMetadata, flags=Flags.REPEATED) # <- Look here!
A.add_field(3, 'c', Bytes)
B.add_field(4, 'ololo', Float32)
B.add_field(5, 'c', TypeMetadata, flags=Flags.REPEATED) # <- And here!
B.add_field(6, 'd', Bool, flags=Flags.PACKED_REPEATED)
C.add_field(7, 'ghjhdf', UVarint)
msg = A()
msg.a = 1
msg.b = [B, C] # Assigning of types.
msg.c = 'ololo'
bytes = msg.dumps()
...
msg = A.loads(bytes)
msg2 = msg.b[0]() # Creating a message of the loaded type.
You can send your bytes
anywhere and you'll got your message type on the other side!
add_field
return the message type itself, thus you can do so:
MessageType().add_field(1, 'a', EmbeddedMessage(MessageType().add_field(1, 'a', UVarint)))
See protobuf
to see the API and run-tests
modules to see more usage samples.