Project to create an automated pipeline with Docker and docker compose to investigate data from ransomware leaks.
The project started after a friend asked for help investigating if a data leak contained the friends personal information.
This tool can be used to automate all of (or a selection of) the steps below.
- Download files from a leek site via TOR
- Extract files from .rar, .ZIP, .tgz and more with the help of 7-zip
- Ingest into Elasticsearch with the Tika pipeline
- Search via Kibana and JupyterLab notebooks
Download files automatically from leek sites using TOR. I use a forked version of aria2-onion-downloader.
Automated extraction of compressed files with a simple container running 7-zip.
After unpacking the downloaded files a couple of optional (but default) steps are executed.
- Run 7-zip once more on the extracted files.
- Run readpst on files with the extension .pst (Outlook Data File).
Ingest files to search into Elasticsearch with the attachment processor enabled. The processor uses Apache Tika to extract text from files.
I've incorporated the docker-elk repository setup and run Elasticsearch and Kibana but have removed Logstash.
Search can be done with Kibana and a JupyterLab notebook. The notebook is my reuteras/container-notebook.
You must increase the RAM that Docker can use to 18 GB or more. Otherwise Elasticsearch will not start if you don't lower the memory specified in the file docker-compose.yml.
Download the repository from GitHub and change to the new directory.
git clone https://github.com/reuteras/DEIS.git
cd DEIS
Configure DEIS by changing three files:
- Modify passwords in .env. If the downloaded files are password protected you must set the ZIP_PASSWORD.
- Add a list of URLs (one per line) for files to download to a file in the urls directory.
- Copy deis.cfg.default to deis.cfg and update the settings described in the file.
Setup Elasticsearch and Kibana by running the command below which will start a configuration container and dependent containers.
docker compose --profile setup up -d
Wait for deis-setup-1 to exit. Tailing the container logs will exit when the container is done after about 45 seconds.
docker logs deis-setup-1 -f
To run all steps in DEIS run.
docker compose --profile deis up -d
Monitor progress by first running:
make venv
And then run the bin/progress.py Python script with:
make progress
Press CTRL-C to exit the progress display.
The following web services are available:
- http://127.0.0.1:3000/ - Gotenberg server
- http://127.0.0.1:5601/ - Elastic/Kibana
- http://127.0.0.1:8080/ - AriaNg
- http://127.0.0.1:8081/file/ - Download file based on sha256
- http://127.0.0.1:8081/convert/ - Convert file to PDF (if possible) and download file based on sha256
- http://127.0.0.1:8888/ - JupyterLab
If you already have the files available you can skip the download and extraction steps and only ingest the files to Elasticsearch. The files must be in the directory extracted or you have to update deis.cfg.
make ingest
./bin/ingest.sh
Disable collection by Elastic by opening http://127.0.0.1:5601/app/management/kibana/settings, click on Global Settings and scroll down and click off on Share usage with Elastic.
Files are added to elastic with timestamp from the filesystem. Search in discovery with absolute time range from Jan 1, 1970 @ 00:00:00.000 to now.
A quick overview of the data is available in the dashboard named Leaked data.
To only search a for data already in elastic you can use docker compose up -d as start command.
Stop all services with docker compose --profile deis down.
If you get a error message about max_analyzed_offset open the developer console at http://127.0.0.1:5601/app/dev_tools#/console and execute the following command:
PUT /leakdata-index-*/_settings
{
"index" : {
"highlight.max_analyzed_offset" : 2000000000,
}
}
At the same time run the following.
PUT _cluster/settings
{
"persistent": {
"search.max_async_search_response_size": "50mb"
}
}
This project uses several open source tools in combination. A list below and please submit an issue if I have missed any:
- docker-elk
- aria2-onion-downloader which uses AriaNg
- Apache Tika
- readpst
- Tor
- The whole ELK-stack by Elastic.co
- Jupyterlab
Lots of things :)
- Monitor mayswind/AriaNg for new releases.