/hunts

🐻‍❄️ 🏹 Threat hunting with Polars and flaws.cloud AWS CloudTrail datasets.

Primary LanguageJupyter NotebookMIT LicenseMIT

Threat Hunting with Polars

Made with Jupyter nbviewer Polars Discord

Threat hunting with Polars and flaws.cloud AWS CloudTrail datasets. Check out threat hunting notebook in nbviewer or rerun the hunt yourself in Jupyter lab.

Normalized datasets and alerts can be found as parquet files in the results directory. You can load these for further exploration using your OLAP database of choice.

Motivation

Polars is a OLAP query engine written in Rust. It's highly memory efficient, uses Apache Arrow as its memory model, and consistently tops database speed benchmarks against distributed OLAP engines e.g. PySpark and Snowflake.

At Tracecat, we use Polars as an alternative to jq or grep for quick-and-dirty threat hunting.

Why Polars for log analysis?

If your logs fit in memory and you are using Python / Jupyter Notebooks as part of your threat hunting process, Polars should be your goto tool for threat hunting.

Note: for every 1GB of gzipped JSON logs on disk, you can expect Polars in-memory data model to take up approximately ~500MB of RAM.

Getting Started

Prerequisites

Requires python>3.9, pip, and git lfs to be installed.s

First clone the repository and download datasets from git lfs (large file system).

git clone git@github.com:TracecatHQ/hunts.git
cd hunts
git lfs fetch
git lfs pull

Create a new python environment using pip or conda (optional), then install the required dependencies via pip install -r requirements.txt.

Finally, spin up Jupyter lab using jupyter lab to view the aws_flaws.ipynb and aws_flaws_2.ipynb notebooks inside the notebooks directory.

Contact Us

Interested in our work bringing low-cost, but powerful data engineering tools to cybersecurity? We'd love to hear your thoughts over email founders@tracecat.com or find us in the Tracecat Discord community!

License

MIT License