airbnb/streamalert

[Bug][Classifier] TypeError: unhashable type: 'dict' when processing raw event encapsulated in a string

chunyong-lin opened this issue · 1 comments

Background

The PR #1077 surfaces a bug in our Parser that StreamAlert would throw an exception TypeError: unhashable type: 'dict' when parsing TrendMicro schema because the schema is strange!!!

Okay, the root cause is TrendMicro events are a list of dict and encapsulated in string. The parser for this type of events will be json with json_path configured in the schema conf file. We will hit the bug if TrendMicro events goes to same data source where contains other events won't require json_path.

TL;DR, the issue can be reproduced by two approaches.

  1. Adding a unit test
def test_parse_record_not_dict_mismatch(self):
    """JSONParser - Parse record not in dict type and doesn't match schema"""
    options = {
        'schema': {
            'key': 'string'
        },
        'parser': 'json'
    }
    record_data = "[{\"key\": \"value\"}]"
    parser = JSONParser(options)
    assert_equal(parser.parse(record_data), False)
  1. Verify via python interpreter
python 
>>> set({'a': 1})
{'a'}
>>> set([{'a': 1}])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>>

The full traceback is similar to

Traceback (most recent call last):
  File "manage.py", line 116, in <module>
    main()
  File "manage.py", line 112, in main
    sys.exit(not cli_runner(options))
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert_cli/runner.py", line 70, in cli_runner
    result = cmds[args.command](args)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert_cli/runner.py", line 126, in <lambda>
    command: lambda opts, cmd=cli_command: cmd.handler(opts, config)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert_cli/test/handler.py", line 198, in handler
    result = result and TestRunner(options, config).run()
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert_cli/test/handler.py", line 372, in run
    classifier_result = self._run_classification(event)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert_cli/test/handler.py", line 251, in _run_classification
    return _classifier.run(records=[record])
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert/classifier/classifier.py", line 250, in run
    self._classify_payload(payload)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert/classifier/classifier.py", line 170, in _classify_payload
    self._process_log_schemas(record, logs_config)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert/classifier/classifier.py", line 135, in _process_log_schemas
    parsed = parser.parse(payload_record.data)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert/classifier/parsers.py", line 490, in parse
    valid = valid and self._key_check(record, self._schema, self._optional_top_level_keys)
  File "/Users/SECRET_DIR_PATH_BALABALA/streamalert/classifier/parsers.py", line 246, in _key_check
    keys = set(record) if not optionals else set(record).union(optionals)
TypeError: unhashable type: 'dict'

Steps to Reproduce

See the background description.

Desired Change

Handle when the record is a list of dict.

Fixed in PR #1085