/protobuf_decoder

Decode protobuf without proto file

Primary LanguagePythonMIT LicenseMIT

Test Coverage

Protobuf Decoder

Simple protobuf decoder for python

Motivation

The goal of this project is decode protobuf binary without proto files

Installation

Install using pip

pip install protobuf-decoder

Simple Examples

"""
# proto
message Test1 {
  string a = 1;
}

# message
{
  "a": "테스트"
}

# binary
0A 09 ED 85 8C EC 8A A4 ED 8A B8
"""
from protobuf_decoder.protobuf_decoder import Parser

test_target = "0A 09 ED 85 8C EC 8A A4 ED 8A B8"
parsed_data = Parser().parse(test_target)
assert parsed_data == ParsedResults([ParsedResult(field=1, wire_type="string", data='테스트')])
assert parsed_data.to_dict() == {'results': [{'field': 1, 'wire_type': 'string', 'data': '테스트'}]}
"""
# proto
message Test1 {
  int32 a = 1;
}

message Test2 {
  Test1 b = 3;
}

# message
{
  "a": {
        "b": 150
        }
}

# binary
1a 03 08 96 01
"""
from protobuf_decoder.protobuf_decoder import Parser

test_target = "1a 03 08 96 01"
parsed_data = Parser().parse(test_target)
assert parsed_data == ParsedResults(
    [
        ParsedResult(field=3, wire_type="length_delimited", data=ParsedResults([
            ParsedResult(field=1, wire_type="varint", data=150)
        ]))
    ])
assert parsed_data.to_dict() == {'results': [{'field': 3, 'wire_type': 'length_delimited', 'data': {
    'results': [{'field': 1, 'wire_type': 'varint', 'data': 150}]}}]}
"""
# proto
message Test1 {
  required string a = 1;
}

# message
{
  "a": "✊"
}

# binary
0A 03 E2 9C 8A

"""
from protobuf_decoder.protobuf_decoder import Parser

test_target = "0A 03 E2 9C 8A"
parsed_data = Parser().parse(test_target)
assert parsed_data == ParsedResults([ParsedResult(field=1, wire_type="string", data='✊')])
assert parsed_data.to_dict() == {'results': [{'field': 1, 'wire_type': 'string', 'data': '✊'}]}

Nested Protobuf Detection Logic

Our project implements a distinct method to determine whether a given input is possibly a nested protobuf. The core of this logic is the is_maybe_nested_protobuf function. We recently enhanced this function to provide a more accurate distinction and handle nested protobufs effectively.

Current Logic

The is_maybe_nested_protobuf function works by:

  • Attempting to convert the given hex string to UTF-8.
  • Checking the ordinal values of the first four characters of the converted data.
  • Returning True if the data might be a nested protobuf based on certain conditions, otherwise returning False.

Extensibility

You can extend or modify the is_maybe_nested_protobuf function based on your specific requirements or use-cases. If you find a scenario where the current logic can be further improved, feel free to adapt the function accordingly.

(A big shoutout to @fuzzyrichie for their significant contributions to this update!)

Remain Bytes

If there are remaining bytes after parsing, the parser will return the remaining bytes as a string.

from protobuf_decoder.protobuf_decoder import Parser

test_target = "ed 85 8c ec 8a a4 ed 8a b8"
parsed_data = Parser().parse(test_target)

assert parsed_data.has_results is False
assert parsed_data.has_remain_data is True

assert parsed_data.remain_data == "ed 85 8c ec 8a a4 ed 8a b8"
assert parsed_data.to_dict() == {'results': [], 'remain_data': 'ed 85 8c ec 8a a4 ed 8a b8', }

Reference