airbnb/streamalert

[Improvement] Add streamalert_normalization as a top optional key automatically

chunyong-lin opened this issue · 0 comments

Background

Based on the #1250, in order to cross join search original record between artifacts and original tables by record_id, it requires the original record to have streamalert_normalization field searchable, in which contains record_id. The streamalert_normalization field having following format,

{
    "record": {
        "region": "us-east-1",
        "detail": {
            "awsRegion": "us-west-2"
        }
    },
    "streamalert_normalization": {
        "streamalert_record_id": "abcdef0123456789",
        "region": [
            {
                "values": ["region_name"],
                "function": "AWS region"
            },
            {
                "values": ["region_name"],
                "function": "AWS region"
            }
        ]
    }
}

And the cross join search can be

SELECT artifacts.*,
         events.detail
FROM 
    (SELECT streamalert_record_id AS record_id,
         type,
         value
    FROM "PREFIX_streamalert"."artifacts"
    WHERE dt='2020-04-30-01'
            AND value='Root') AS artifacts
LEFT JOIN 
    (SELECT CAST(json_extract(streamalert_normalization,
         '$.streamalert_record_id') AS varchar) AS record_id, detail
    FROM "PREFIX_streamalert"."cloudwatch_events"
    WHERE dt='2020-04-30-01') AS events
    ON artifacts.record_id = events.record_id
LIMIT 10 

Desired Change

Right now, it requires users to add streamalert_normalization key as an optional top level key to the schema which has normalization configured. We think it is good idea to add this key to conf/schemas/*.json automatically during normalization build time.

But we still need to run build command to update the tables manually

python manage.py build --target "kinesis_firehose_*"