edgi-govdata-archiving/web-monitoring-processing
Tools for access, "diff"-ing, and analyzing archived web pages
HTMLGPL-3.0
Issues
- 6
- 2
- 0
Don't require credentials in DB client
#844 opened by Mr0grog - 4
Use a more robust PDF library
#655 opened by Mr0grog - 1
- 1
Lint with flake8
#705 opened by Mr0grog - 3
Handle too many connections to Wayback automatically
#525 opened by Mr0grog - 1
Move docs-building to GH Actions
#671 opened by danielballan - 1
Import script should upload bodies directly to S3
#663 opened by Mr0grog - 2
- 1
Import script should include client-side redirects
#670 opened by Mr0grog - 0
- 1
- 1
Import script should follow more of a pipeline style
#669 opened by Mr0grog - 0
- 0
DB API should have built-in retry functionality
#659 opened by Mr0grog - 2
- 1
- 1
Update import script to use Wayback v0.3
#641 opened by Mr0grog - 7
- 6
- 3
Parameter Names and Types are Displayed Wrong in Docs
#468 opened by Mr0grog - 2
Integrate scripts/annotations_import and scripts/ia_healthcheck into scripts/wm
#529 opened by Mr0grog - 1
Use dotenv for environment configuration
#499 opened by Mr0grog - 0
Diffing server does not release ports on SIGTERM
#576 opened by Mr0grog - 0
- 0
- 4
☂️ Cron jobs should run as Kubernetes Cron Jobs
#376 opened by ibuys - 9
Put wayback API in a separate package.
#477 opened by danielballan - 1
DB client should set the `Accept` header on requests
#495 opened by Mr0grog - 1
Use `dateutil` for Date Parsing Everywhere
#360 opened by Mr0grog - 1
If fetched content does not match hash in diffing server, don’t return 500 error
#491 opened by Mr0grog - 0
Use Python native async syntax in diff server
#432 opened by Mr0grog - 0
Switch MAINTAINER to LABEL directive in Docker
#467 opened by Mr0grog - 2
Remove Pagefreezer support
#379 opened by Mr0grog - 6
- 2
Compare URLs with any HTTP status
#404 opened by vbanos - 1
Use Sentry’s Tornado integration for diff server
#342 opened by Mr0grog - 2
Handle Upcoming Tornado HTTPClient Errors
#282 opened by Mr0grog - 1
Upgrade to Python 3.7
#341 opened by Mr0grog - 1
Support ETag headers
#264 opened by danielballan - 4
Page with an unbelievable number of dropdown options fails to parse in the middle
#288 opened by Mr0grog - 0
IA healthcheck script should only check active pages
#328 opened by Mr0grog - 0
- 3
Investigate why diffing server does not seem to be handling concurrent jobs well
#303 opened by Mr0grog - 2
Support invalid encoding `iso-8559-1`
#310 opened by Mr0grog - 0
_decode_body fails if body is empty
#311 opened by Mr0grog - 0
DiffMatchPatch can’t handle null terminators
#312 opened by Mr0grog - 0
Set up Sentry.io for the diffing server
#306 opened by Mr0grog - 0
Protect `html_text_dmp` and `html_source_dmp` with content-type checking and sniffing
#287 opened by Mr0grog