rdump timestamp field
Closed this issue · 5 comments
The runkey
plugin and other plugins have the field ts
. The field ts
can be used as timestamp field for all these plugins to analyze records with Kibana in chronological order.
target-query -f runkeys -q -t targets/MSEDGEWIN10_20220708124036.tar | rdump -w elastic://http://localhost:9200?index=timelines
But when using a plugin without the field ts
these record cannot be shown in Kibana. For example amcache
that does not have the field ts
, because it uses other datetime
fields like created
.
target-query -f amcache -q -t targets/MSEDGEWIN10_20220708124036.tar | rdump -w elastic://http://localhost:9200?index=timelines
I don't know if it is supported, but is it possible that Dissect.target plugins that output records use the field ts
that corresponds to the event creation timestamp of the parsed artefact? I am guessing Fox-IT uses Splunk which uses timestamp assignment which automatically assigns a timestamp based on what it finds in the raw even data. But this feels like magic compared to using records with other adapters.
When using the plugin mft I would expect that a single MFT entry creates four events instead of one with four different timestamps.
References
https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
https://github.com/fox-it/dissect.target/blob/main/dissect/target/plugins/os/windows/log/amcache.py#L17
https://www.elastic.co/guide/en/elasticsearch/reference/current/anonymous-access.html
https://www.elastic.co/guide/en/kibana/current/data-views.html
I'm not exactly sure what solution is needed to make rdump compatible with Kibana and later on Timesketch. This is also not an issue in this project, but I'm curious if any thought has been give to how Dissect can be used without Splunk.
We have incorporated the ts
field in as many plugins as possible were we taught the mapping was logical, for exactly this purpose. Though, for a small number of plugins (amcache included) we decided that mapping was not logical. From a cursory google search it looks like Elastic's painless scripting language in combination with index pipelines could be a nice solution to mimic Splunk's automatic timestamp discovery during ingestion time by looking at _raw
. Links to what im referring to:
- https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-ingest-processor-context.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
Alternatively, you could think about using an intermediary script which acts like a proxy and adds a ts
field to every record by searching for a timestamp value to also mimic that Splunk magic. If you'd go this route, have a look at https://github.com/fox-it/flow.record/blob/main/flow/record/tools/geoip.py for some inspiration :).
You could also do some magic using rdump
itself! This in itself is not really scalable but would resolve this specific issue.
Taking your command, we could use the -E
argument in rdump
to add the ts
field if the regf_mtime
field exists in the record. If it's not defined, we just assign it None. But you can add your desired value instead.
target-query -f amcache -q -t targets/MSEDGEWIN10_20220708124036.tar | rdump -E 'ts=mtime_regf if "mtime_regf" in globals() else None' -w elastic://http://localhost:9200?index=timelines
Okay thanks for the explanation. I am currently working on an intermediary script which acts as a proxy between JSONL files created by rdump and Logstash. Is this something that could be added to the jsonl and/or elastic adapter?
Not related to flow.record but also timestamp related when using a plugin with multiply timestamp fields I would have expected that multiply reocords/events are created. For example for the mft plugin https://github.com/fox-it/dissect.target/blob/main/dissect/target/plugins/filesystem/ntfs/mft.py#L23.
One record
creation_time: X, last_modification_time: X, last_change_time: X, last_access_time: X, [...]
Instead four records/events
ts:creation_time, ts_description:"creation_time", [...]
ts:last_modification_time, ts_description:"last_modification_time", [...]
ts:last_change_time, ts_description:"last_change_time", [...]
ts:last_access_time, ts_description:"last_access_time", [...]
You could use the following flow script to output multiple ts
enriched records based on the datetime
fields of the original record. Save as flow-multi-timestamp.py
:
from flow.record import RecordReader, RecordWriter, RecordDescriptor, extend_record
TimestampRecord = RecordDescriptor(
"record/timestamp",
[
("datetime", "ts"),
("string", "ts_description"),
],
)
with RecordReader() as reader, RecordWriter() as writer:
for record in reader:
# get all `datetime` fields. (excluding _generated).
dt_fields = record._desc.getfields("datetime")
# no `datetime` fields found, just output original record
if not dt_fields:
writer.write(record)
continue
# output a new record for each `datetime` field assigned as `ts`.
record_name = record._desc.name
for field in dt_fields:
ts_record = TimestampRecord(getattr(record, field.name), field.name)
# we extend `ts_record` with original `record` so TSRecord info goes first.
record = extend_record(ts_record, [record], name=record_name)
writer.write(record)
Example usage with filesystem.py:
$ python3 examples/filesystem.py /tmp
<filesystem/unix/entry path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0 B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>
$ python3 examples/filesystem.py /tmp | python3 flow-multi-timestamp.py
<filesystem/unix/entry ts=2020-01-22 10:19:25.084778 ts_description='ctime' path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0 B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>
<filesystem/unix/entry ts=2020-01-22 10:18:42.899209 ts_description='mtime' path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0 B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>
<filesystem/unix/entry ts=2020-01-22 10:18:42.899209 ts_description='atime' path='/tmp' inode=1152921500312397027 dev=16777220 mode=0o120755 size=11.0 B uid=0 gid=80 ctime=2020-01-22 10:19:25.084778 mtime=2020-01-22 10:18:42.899209 atime=2020-01-22 10:18:42.899209 link='private/tmp'>
If you want more control on the record output of flow-multi-timestamp.py
, just pipe it to rdump
again:
$ python3 examples/filesystem.py /tmp/* | python3 flow-multi-timestamp.py | rdump -L
Thanks for the explanation and the script, this helps a lot!