signalfx/splunk-otel-collector

Migrate checkpoint produces malformed results

omrozowicz-splunk opened this issue · 4 comments

Recently I was working on this bug on otel-collector-chart repo:
signalfx/splunk-otel-collector-chart#745

The case is that the checkpoint file was migrated from SCK to SCK-Otel and on one node it was producing: illegal base64 data at input byte 2281 error. I spent some time analysing this issue and looks like the full communicate of error is:

Invalid base64-encoded string: number of data characters (2281) cannot be 1 more than a multiple of 4

Basically we have a record where we have one less character than we should have had. After ignoring last character and decoding, I received such a "text" :

normal log until a certain point: reason: 'LeaderElectkΌ��H�]��[\��[���X]Z\H��XY�\��X\H��[�Y�\��[�Y�[]�ܚX۝���\L�KLMN��
LL̊�����\��L�LMN��

L������H��XY�\[�X�[ۋΌN�H�X\ٝ[��H�X]Z\Y���X\H��[�Y�\��[�Y�[]�ܚX۝���\L�KLMN���L
�����\��L�LMN��
L�������H�][�Ό
WH�][
�KؚX�Y\[^[�ۙYX\���[Y\�XN�[�Y�\���[YN�[�Y�[]�ܚX۝���\��RQ�XYL�L؍M�L��X
LLMXMMNY���T�U\[ێH��\\U\[ێ

L�H��Y[���]��JN��\�N	ӛܛX[	�X\ێ	�XY�\[�X�[ۉ�]YXKX\Y��[ۜ��[M����[X\�\LH�X[YH��XY�\L�KLMN��LLML�����\��L�LMN��LLM�������H�X\�\΍NWH�[]�X[�^[���X\�\L�KLMN��͌N

Additionally, I've seen a line like this:

{"Finger4MC1vcGVuc2hpZnQtbmV0d29yay5jb25mIGFzIGEgc291cmNlIHRvIGdlbmVyYQ=="},"Offset":1775}

This is not connected with the illegal base64 data error, but is also potentially an issue.

This is probably worth opening as an issue on https://github.com/open-telemetry/opentelemetry-collector-contrib, referencing this issue.

@omrozowicz-splunk did you open a bug upstream? Did you manage to reproduce the issue?

@omrozowicz-splunk did you open a bug upstream? Did you manage to reproduce the issue?

I don't think it is an upstream's issue, I think it is rather something wrong with our migrate checkpoint.
I didn't reproduce it, but I have a malformed file and all the logs from the above comes from my instance. Looks like sometimes converting checkpoints produces malformed results - I wasn't investigating it with the code though.

Closing as this is not reproducible as of now. If you see further errors, please report internally.