SpyderHound

^{https://github.com/MicMetz/SpyderHound}

Usage

Run

Start the app with python App.py .
Place the target url in the input terminal, and press enter to start scraping a domain.
The output terminal will display the progress of the scrape.
1. Links
2. Tokens (Words)
3. Paragraphs
4. Images

Setup

The GUI is built with tkinter, and customtkinter: python pip install customtkinter && pip install tkinter
To produce a standalone .exe file, also install pyinstaller, pip install pyinstaller; then run the build batch script, tools/make_exe.bat to produce the .exe file.

Project Structure

The project is structured as follows:

.
├── core (Core Application Logic)
│   ├── Controller.py
│   ├── Target.py
│   ├── Domain.py
│   ├── Database.py
├── resources (Resources)
│   ├── hate_speech (Hate Speech Data)
│   ├── neg_words (Negative Words)
│   ├── pos_words (Positive Words)
│   ├── stop_words (Stop Words)
│   ├── toxicity (Toxicity Data)
│   ├── ui (User Interface Tkinter Design)
│   │   ├── OutputTerminal.py
│   │   ├── InputTerminal.py
│   │   ├── SidePanel.py
│   │   ├── MessageTerminal.py
├── data (Web Scraping Results)
│   ├── [Target Name]
│   │   ├── [Domain Name]
│   │   │   ├── [Date]
│   │   │   │   ├── [Time]
│   │   │   │   │   ├── [Data]
│   │   │   │   │   │   ├── [Data Type]
├── documentation (Documentation)
│   ├── images (Images)
├── tools (Not Yet Implemented)
│   
├── App.py (Main Application Entry Point)
├── setup.py (Directory Linking)

Development

Updates and Progress

Better error handling for scraping

Improved token stripping

GUI

The GUI is built with tkinter, and customtkinter.

A. The first draft of the GUI can be found in the documentation folder. It is a first draft, and will be updated as the project progresses.

B. The second draft of the GUI can be found in the documentation folder.You can see that the GUI has been updated to include how the application will be designed around the user conducting multiple scrapes simultaneously.

Final GUI:

AMetznger/SpyderHound