scrapedown

A website scraper tool for extracting text with conversion to markdown.md. Files are placed in a directory named after the directory they was found under. Creates a file structure that replicates the site's.

Use with caution

To install:

git clone https://github.com/johnconnor-sec/scrapedown

cd scrapedown

pip install poetry

poetry shell

poetry install

Run it with:

python3 main.py

The tool now includes links gathered from the site and a better output of the markdown text.

This is completely free to anyone who thinks its cool. If anything I think it could work for gathering data for LLMs, notetaking, or finding interesting endpoints.

Just clone it and after installing the dependencies run python3 main.py. Watch it work.

If you'd like to make this project better, please show me what you have made!