ArchiveTeam/grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
PythonNOASSERTION
Pinned issues
Issues
- 6
Python 3.8 deprecation
#245 opened by jacob-willden - 4
Cannot install grab-site on WSL2 (Ubuntu)
#246 opened by jacob-willden - 5
Dockerfile?
#182 opened by 818S - 0
Srcset images not being archived
#243 opened by JubilantJerry - 2
Can Grab-site be used in W7 with pip?
#242 opened by Snippet24816 - 2
Getting 502 Bad Gateway Errors
#241 opened by syberphunk - 3
How do you add custom hooks now?
#207 opened by TheTechRobo - 2
grab-site not displaying any content on Port 29000, but installed and running
#227 opened by DominicBilke - 6
is it possible to output regular files instead of warc?
#228 opened by ftc2 - 4
Should we add an anti-porn igset?
#213 opened by TheTechRobo - 0
Dubious quickmod2 SMF forum ignore
#212 opened by TheTechRobo - 2
- 2
Failed building wheel for fb-re2
#239 opened by 10kmotorola - 1
xFormers Support?
#237 opened by Astra060 - 4
Grab-site gets only a single page
#188 opened by mathuryash5 - 4
- 1
No module named 'autobahn'
#200 opened by vitacell - 2
Grab site is not actually compatible with python 3.8
#229 opened by cenodis - 7
Fix ludios_wpull to support SQLAlchemy 1.4
#198 opened by ivan - 3
- 4
Project Evolution
#192 opened by acrois - 5
Cookies not staying
#187 opened by TheTechRobo - 6
Add upload option
#226 opened by upintheairsheep - 5
- 2
Can't grab Wikimedia thumbnails, even when global is removed from igset file
#223 opened by BrinBellway - 1
Add a --no-global-igset option
#224 opened by ivan - 3
RuntimeError: To use txaio, you must first select a framework with .use_twisted() or .use_asyncio()
#220 opened by PadraigEire - 3
No messege on Dashboard
#219 opened by CircleCrop - 10
install error in macOS Catalina
#217 opened by LeeBinder - 5
- 0
- 2
Syntax Error on run
#214 opened by trentwiles - 1
Resuming a WARC after hard "No space left on device" error message?
#210 opened by Preservation-Quest - 3
infinite recursion on offsite links?
#194 opened by TheTechRobo - 5
- 0
Add some Tumblr ignores to global igset
#204 opened by TheTechRobo - 2
Backslash to Forward slash correction
#199 opened by acrois - 0
Add SimpleMachineForums igsets
#201 opened by TheTechRobo - 1
- 2
- 1
Dupe spotter user-defined list of expressions / separation of default dupe spotter expressions
#197 opened by acrois - 8
Ignore errors and keep crawling
#193 opened by TowardMyth - 3
What does the ID do?
#191 opened by TheTechRobo - 0
Change settings mid-crawl
#189 opened by TheTechRobo - 2
- 4
--no-offsite-links doesn't work
#183 opened by tripleo1 - 0
Ignore local/lan-only hosts (and invalid domains).
#184 opened by jtagcat - 4
Can't evaluate Select
#181 opened by TheTechRobo - 1
Consider an option to generate WACZ files after a crawl is done for better replay with ReplayWeb.page
#179 opened by ikreymer - 0
del
#177 opened by nekto-nekto