fox-it/flow.record

rdump timestamp field

Closed this issue · 5 comments

The runkey plugin and other plugins have the field ts. The field ts can be used as timestamp field for all these plugins to analyze records with Kibana in chronological order.

target-query -f runkeys -q -t targets/MSEDGEWIN10_20220708124036.tar | rdump -w elastic://http://localhost:9200?index=timelines

image

But when using a plugin without the field ts these record cannot be shown in Kibana. For example amcache that does not have the field ts , because it uses other datetime fields like created.

target-query -f amcache -q -t targets/MSEDGEWIN10_20220708124036.tar | rdump -w elastic://http://localhost:9200?index=timelines

I don't know if it is supported, but is it possible that Dissect.target plugins that output records use the field ts that corresponds to the event creation timestamp of the parsed artefact? I am guessing Fox-IT uses Splunk which uses timestamp assignment which automatically assigns a timestamp based on what it finds in the raw even data. But this feels like magic compared to using records with other adapters.

When using the plugin mft I would expect that a single MFT entry creates four events instead of one with four different timestamps.

References

https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
https://github.com/fox-it/dissect.target/blob/main/dissect/target/plugins/os/windows/log/amcache.py#L17
https://www.elastic.co/guide/en/elasticsearch/reference/current/anonymous-access.html
https://www.elastic.co/guide/en/kibana/current/data-views.html

I'm not exactly sure what solution is needed to make rdump compatible with Kibana and later on Timesketch. This is also not an issue in this project, but I'm curious if any thought has been give to how Dissect can be used without Splunk.

We have incorporated the ts field in as many plugins as possible were we taught the mapping was logical, for exactly this purpose. Though, for a small number of plugins (amcache included) we decided that mapping was not logical. From a cursory google search it looks like Elastic's painless scripting language in combination with index pipelines could be a nice solution to mimic Splunk's automatic timestamp discovery during ingestion time by looking at _raw. Links to what im referring to:

Alternatively, you could think about using an intermediary script which acts like a proxy and adds a ts field to every record by searching for a timestamp value to also mimic that Splunk magic. If you'd go this route, have a look at https://github.com/fox-it/flow.record/blob/main/flow/record/tools/geoip.py for some inspiration :).

You could also do some magic using rdump itself! This in itself is not really scalable but would resolve this specific issue.

Taking your command, we could use the -E argument in rdump to add the ts field if the regf_mtime field exists in the record. If it's not defined, we just assign it None. But you can add your desired value instead.
target-query -f amcache -q -t targets/MSEDGEWIN10_20220708124036.tar | rdump -E 'ts=mtime_regf if "mtime_regf" in globals() else None' -w elastic://http://localhost:9200?index=timelines

Okay thanks for the explanation. I am currently working on an intermediary script which acts as a proxy between JSONL files created by rdump and Logstash. Is this something that could be added to the jsonl and/or elastic adapter?

Not related to flow.record but also timestamp related when using a plugin with multiply timestamp fields I would have expected that multiply reocords/events are created. For example for the mft plugin https://github.com/fox-it/dissect.target/blob/main/dissect/target/plugins/filesystem/ntfs/mft.py#L23.

One record

creation_time: X, last_modification_time: X, last_change_time: X, last_access_time: X, [...]

Instead four records/events

ts:creation_time, ts_description:"creation_time", [...]
ts:last_modification_time, ts_description:"last_modification_time", [...]
ts:last_change_time, ts_description:"last_change_time", [...]
ts:last_access_time, ts_description:"last_access_time", [...]

You could use the following flow script to output multiple ts enriched records based on the datetime fields of the original record. Save as flow-multi-timestamp.py:

from flow.record import RecordReader, RecordWriter, RecordDescriptor, extend_record

TimestampRecord = RecordDescriptor(
    "record/timestamp",
    [
        ("datetime", "ts"),
        ("string", "ts_description"),
    ],
)

with RecordReader() as reader, RecordWriter() as writer:
    for record in reader:
        # get all `datetime` fields. (excluding _generated).
        dt_fields = record._desc.getfields("datetime")

        # no `datetime` fields found, just output original record
        if not dt_fields:
            writer.write(record)
            continue

        # output a new record for each `datetime` field assigned as `ts`.
        record_name = record._desc.name
        for field in dt_fields:
            ts_record = TimestampRecord(getattr(record, field.name), field.name)
            # we extend `ts_record` with original `record` so TSRecord info goes first.
            record = extend_record(ts_record, [record], name=record_name)
            writer.write(record)

Example usage with filesystem.py:

$ python3 examples/filesystem.py /tmp
<filesystem/unix/entry path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0  B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>

$ python3 examples/filesystem.py /tmp | python3 flow-multi-timestamp.py
<filesystem/unix/entry ts=2020-01-22 10:19:25.084778 ts_description='ctime' path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0  B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>
<filesystem/unix/entry ts=2020-01-22 10:18:42.899209 ts_description='mtime' path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0  B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>
<filesystem/unix/entry ts=2020-01-22 10:18:42.899209 ts_description='atime' path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0  B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>

If you want more control on the record output of flow-multi-timestamp.py, just pipe it to rdump again:

$ python3 examples/filesystem.py /tmp/* | python3 flow-multi-timestamp.py | rdump -L

Thanks for the explanation and the script, this helps a lot!