elastic/ecs-logging-python

StructlogFormatter throws TypeError with duplicated dotted keys

psontag opened this issue · 5 comments

If the event_dict contains a key that is also part of a dotted key the StructlogFormatter throws a TypeError.

Reproducible example:

from ecs_logging import StructlogFormatter

logs = {'event': 'RuntimeError', 'event.dataset': 'dev_setup.log'}

matter = StructlogFormatter()
matter("", "name", logs)

Results in a TypeError

TypeError: 'str' object does not support item assignment

I think the problem is in this line.

def merge_dicts(from_, into):
    """Merge deeply nested dictionary structures.
    When called has side-effects within 'destination'.
    """
    for key, value in from_.items():
        if isinstance(value, dict):
            merge_dicts(value, into.setdefault(key, {}))   <------
        else:
            into[key] = value
    return into

into.setdefault() returns the value of the key if the key is already present in the dictionary. In the example above the value is a string which causes the TypeError.

This is fixed in #46 -- however, I should note that because event is used as the log message for the structlog formatter, if you were to use your example above the event would end up being overwritten by your event.dataset, so "message"'s value would end up as "{'dataset':'dev_setup.log'}".

Not sure if we should be "merging" these strings or not, but we currently don't.

This is fixed in #46 -- however, I should note that because event is used as the log message for the structlog formatter, if you were to use your example above the event would end up being overwritten by your event.dataset, so "message"'s value would end up as "{'dataset':'dev_setup.log'}".

What is the intended way to use the elastic apm structlog processor

https://github.com/elastic/apm-agent-python/blob/ee75cb8f238303b79d0f697a7f2eca547a1dfe8c/elasticapm/handlers/structlog.py#L35

with the ecs structlog formatter then? With elastic/apm-agent-python#1061 it is now injecting the ˋevent.datasetˋ key which would now overwrite the message as you mentioned.

Yeah I realized that shortly after I commented. I'll be working on fixing that today, it's definitely a problem.

Alright, #46 is now "complete". The behavior you were seeing was the "correct" behavior -- if we have conflicts, we want to error on those conflicts rather than throwing away values.

However, event and event.dataset are a special case. Because we use event for message in the final log, and the python APM agent defines event.dataset, we just needed to handle event -> message prior to dictionary processing.

The end result is that your reproduction case above should no longer error, but if you did the same conflict with something besides event, it would error, and I think that's good behavior.

Sounds good and thanks for adding a helpful error message in case of conflicts 👍