awslabs/kinesis-aggregation

kpl_pb2.py should be regenerated to avoid problems with upcoming protobuf release

Opened this issue · 11 comments

Due to upcoming changes in python's protobuf, which can be previewed with pip install protobuf==4.21.0rc2, the bundled kpl_pb2.py should be regenerated with a newer version of protoc.

import aws_kinesis_agg.kpl_pb2
../../.venv/kpl/lib/python3.8/site-packages/aws_kinesis_agg/kpl_pb2.py:51: in <module>
    _descriptor.FieldDescriptor(
../../.venv/kpl/lib/python3.8/site-packages/google/protobuf/descriptor.py:560: in __new__
    _message.Message._CheckCalledFromGeneratedFile()
E   TypeError: Descriptors cannot not be created directly.
E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E   If you cannot immediately regenerate your protos, some other possible workarounds are:
E    1. Downgrade the protobuf package to 3.20.x or lower.
E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E   
E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

This is now causing the error shown above, due to protobuf 4.21 having been released.

Since the protobuf version is not pinned to 3.x in aws-kinesis-agg, Pip is installing 4.21 for packages that have aws-kinesis-agg as a requirement.

While a version is set in requirements.txt for the aws-kinesis-agg build itself (https://github.com/awslabs/kinesis-aggregation/blob/ae6250befb5b06dc86af8decef3a7e4f09a5d2c7/python/requirements.txt), it's not set as a package dependency for other packages that require it (https://github.com/awslabs/kinesis-aggregation/blob/398fbd4b430d4bf590431b301d03cbbc94279cef/python/aws_kinesis_agg.egg-info/requires.txt) so Pip just fetches the newest version.

kbl_pb2.py should be regenerated, but it would also be good practice to pin the protobuf version to at least a major version in the dependencies.

I'm not sure I understand entirely what the issue is. The protobuf version is pinned to 3.15.0 in requirements.txt. I tsounds like the issue is that if you do a system wide install of protobuf, you will get 4.21, which then when you try to run this version will not work. This would be expected, and you would need to downgrade protobuf to a version before 4.21 to get compatibility. Is that correct?

Given this, I could regenerate the kpl_pb2 file, but the structure within later versions of kinesis producer works differently, and from my first glance are incompatible with the existing types and structures. Therefore a regenerate isn't a simple fix, and so looking to see if we can get you unblocked faster by downgrading protobuf.

I understand the issue now, and have pushed version 1.2.0 in 3db23db that addresses this issue. Please let me know if this works for you.

This is not working for me because of this line:

import config_pb2 as config__pb2

This seems to be the only import which refers to an internal module without referring to the aws_kinesis_agg package. If I patch this in my installation:

sed -i -e 's/import config_pb2 as config__pb2/import aws_kinesis_agg.config_pb2 as config__pb2/' /pkg/aws_kinesis_agg/messages_pb2.py

I'm then able to import. Admittedly though, this seems to be because I'm packaging this up in a lambda and I'm going out of my way to strip the zipfile to be pyc-only to save space. I'm not sure why running with pyc-only would cause an import error here, while using the *.py files seems to work without a problem, but I see there were additional changes to add paths to sys.path so curious if @IanMeyers already bumped into some import/path issues while making this update?

Yes, sort of. This issue appears to be related to protocolbuffers/protobuf#1491. I am working on a fix in version 1.2.1 for you that should also be compliant with compiling lambda packages, but will require including the project bast directory to include the protobuf generated libs.

Pushed 1.2.1 - can you please try it?

@IanMeyers I'm seeing a "No module named 'messages_pb2'" with 1.2.1

Can you share a stack trace please?

Think I see the issue - standby please.

Issue should be fixed in version 1.2.2. Package builder was ignoring the root path. Sorry folks.

Thanks for the incredibly speedy response!