webrecorder/browsertrix-crawler
Run a high-fidelity browser-based web archiving crawler in a single Docker container
TypeScriptAGPL-3.0
Issues
- 0
some links on page not crawled
#723 opened by robert-1043 - 1
WARC Record Write Failed
#722 opened by benoit74 - 1
- 10
Browser disconnected (crashed?)
#706 opened by rgaudin - 2
Youtube video is not crawled when `loading="lazy"`
#699 opened by benoit74 - 1
- 0
- 1
Issue creating profile for intranet
#705 opened by MRLeflei - 0
- 0
- 3
- 0
Document crawl collection layout
#675 opened by tw4l - 4
[Bug]: Crawl Configuration Inconsistency: Max Depth and Include Any Linked Page
#693 opened by mona-ul - 0
Include depth in pages jsonl files
#690 opened by tw4l - 1
Brotli decompression error
#687 opened by rgaudin - 1
Browser Crash & Docker Exit Code
#683 opened by gitreich - 3
Issue crawling a web property with big PDFs
#676 opened by benoit74 - 4
A suggestion for making WACZ and WARC-requests
#663 opened by hamoudak - 1
SSH Socks5 Tunnel Proxy Support
#670 opened by ikreymer - 5
Execution context was destroyed
#655 opened by rgaudin - 2
ETA computation
#660 opened by wsdookadr - 1
- 0
Use in-place streaming to generate WACZ files
#674 opened by tw4l - 1
- 5
Crawl button with javascript navigation
#665 opened by hamzamac - 1
- 3
[BUG] invalid gzipped WARC
#662 opened by wsdookadr - 1
Youtube Video Quality
#648 opened by fservida - 0
Remove invalid crc32 calculation
#653 opened by ikreymer - 2
Should invalid URL halt the scraping process?
#654 opened by rgaudin - 0
Behavior run partially failed - Protocol error
#652 opened by zlodejpapiru - 2
der-postillon.com: crawler considers that scrolling is not necessary while it seems mandatory
#647 opened by benoit74 - 0
Fix skipping of 206 responses
#645 opened by ikreymer - 2
Can an AWS alternative to Access Keys be added?
#644 opened by jblukach - 1
Vimeo Playback: Retrieve full stream
#632 opened by kila58 - 3
Skipping URL from unknown frame
#643 opened by zlodejpapiru - 2
Crawler keeps signing out(?)
#642 opened by Azmodeszer - 10
"Login form could not be found"
#637 opened by Azmodeszer - 0
- 0
Better handling of redirect chains to same page
#634 opened by ikreymer - 5
- 4
Revisit WARC-Resource-Type or add a new header
#630 opened by benoit74 - 1
- 5
Could there be a way to create warcs with certain size after one RUN (combinewarc / rolloversize...)
#617 opened by ssairanen - 2
Misleading error message
#598 opened by rgaudin - 1
- 2
- 1
- 1
- 0
Make timeout logging messages warns, not errors
#599 opened by tw4l