an onionsite scraping framework with the primary intent of tracking ransomware groups
running within github actions, groups are visited & posts are indexed within this repository at a regular cadence
missing a group ? try the issue template
curl -sL ransomwhat.telemetry.ltd/posts | jq
curl -sL ransomwhat.telemetry.ltd/groups | jq
looking for historical data? check ransomwatch-history - from hourly uptime records to git-tracked source HTML for groups over the period of May 2021 to May 2022 - there's insights to be had
content within ransomwatch.telemetry.ltd, posts.json, groups.json and the docs/ & source/ directories is dynamically generated based on infrastructure of real-world threat actors in near-real-time.
whilst sanitisation efforts have been taken, by viewing or accessing ransomwatch generated material you acknowledge you are doing so at your own risk.
the torproxy from joshhighet/gotham registry is introduced into the github actions workflow as a service container to allow onion routing within ransomwatch.yml
where possible psf/requests is used to fetch source html. if a javascript engine is required to render the dom mozilla/geckodriver and seleniumhq/selenium are invoked.
the frontend is ultimatley markdown, generated with markdown.py and served with docsifyjs/docsify thanks to pages.github.com
any graphs or visualisations are generated with plotting.py with the help of matplotlib/matplotlib
post indexing is done with a mix of grep
, awk
and sed
within parsers.py - it's brittle and like any ̴̭́H̶̤̓T̸̙̅M̶͇̾L̷͑ͅ ̴̙̏p̸̡͆a̷̛̦r̵̬̿s̴̙͛ĩ̴̺n̸̔͜g̸̘̈, has a limited lifetime.
groups.json
contains hosts, nodes, relays and mirrors for a tracked group or actor
posts.json
contains parsed posts, noted by their discovery time and accountable group
all rendered source HTML is stored within ransomwatch/tree/main/source - change tracking and revision history of these blogs is made possible with git
a script to generate high-resolution screenshots of all online hosts within groups.json
a beautifulsoup script to fetch emails, internal and external links from HTML within source/
fetching sites requires a local tor circuit on tcp://9050 - establish one with;
docker run -p9050:9050 ghcr.io/joshhighet/gotham/torproxy:latest
manage the groups within groups.json
./ransomwatch.py add --name acmecorp --location abcdefg.onion
./ransomwatch.py scrape
iterate files within the source/
directory and contribute findings to posts.json
for a crude health-check across all parsers, use
assets/parsers.sh
./ransomwatch.py parse
ransomwatch is licensed under unlicense.org