https://github.com/MicMetz/SpyderHound
- Start the app with
python App.py
. - Place the target url in the input terminal, and press enter to start scraping a domain.
- The output terminal will display the progress of the scrape.
- Links
- Tokens (Words)
- Paragraphs
- Images
-
The GUI is built with tkinter, and customtkinter:
python pip install customtkinter && pip install tkinter
-
To produce a standalone .exe file, also install pyinstaller,
pip install pyinstaller
; then run the build batch script,tools/make_exe.bat
to produce the .exe file.
The project is structured as follows:
.
├── core (Core Application Logic)
│ ├── Controller.py
│ ├── Target.py
│ ├── Domain.py
│ ├── Database.py
├── resources (Resources)
│ ├── hate_speech (Hate Speech Data)
│ ├── neg_words (Negative Words)
│ ├── pos_words (Positive Words)
│ ├── stop_words (Stop Words)
│ ├── toxicity (Toxicity Data)
│ ├── ui (User Interface Tkinter Design)
│ │ ├── OutputTerminal.py
│ │ ├── InputTerminal.py
│ │ ├── SidePanel.py
│ │ ├── MessageTerminal.py
├── data (Web Scraping Results)
│ ├── [Target Name]
│ │ ├── [Domain Name]
│ │ │ ├── [Date]
│ │ │ │ ├── [Time]
│ │ │ │ │ ├── [Data]
│ │ │ │ │ │ ├── [Data Type]
├── documentation (Documentation)
│ ├── images (Images)
├── tools (Not Yet Implemented)
│
├── App.py (Main Application Entry Point)
├── setup.py (Directory Linking)
- Layout
- Splash Screen
- Main Window
- Side Panel
- Input Terminal
- Output Terminal
- Frames
- Splash Frame
- Main Frame
- Data Frame
- Database Frame
- Core
- Web Scraping
- Scraping
- Improved Scraping with error handling
- Correctly Stripping tokens
- Parsing
- Storing
- Scraping
- Database
- Database Connection
- Database Creation
- Database Insertion
- Database Querying
- Data Analysis
- Data Analysis
- Data Visualisation
- Data Visualisation
- Data Visualisation
- Web Scraping
Improved token stripping
The GUI is built with tkinter, and customtkinter.
A. The first draft of the GUI can be found in the documentation folder. It is a first draft, and will be updated as the project progresses.
B. The second draft of the GUI can be found in the documentation folder.You can see that the GUI has been updated to include how the application will be designed around the user conducting multiple scrapes simultaneously.