screaming-frog-shingling

Uses Screaming Frog Internal HTML with text extraction along with a shingling algorithm to compare content duplication across the pages of a crawled site.

Example Usage

pip install -r requirements.txt
Run Screaming Frog and use Extraction to pull the content out of a specific DOM element.
Export the internal HTML to a CSV file.
Run the script using the following arguments.

 Example Usage:
    -i : Input filename
    -o : Output filename
    -c : Column from Screaming Frog that contains your extracted content.
    Example invocation:
    python sf_shingling.py -i internal_html_ap.csv -o output_html_ap.csv -c "BodyContent 1"

danipolo/screaming-frog-shingling

screaming-frog-shingling

Example Usage