edgi-govdata-archiving/web-monitoring-processing
Tools for access, "diff"-ing, and analyzing archived web pages
HTMLGPL-3.0
Issues
- 5
Tests should use VCR or a fixture server and avoid connecting to third-party services
#236 opened by Mr0grog - 4
Extract title from PDF content when importing
#156 opened by Mr0grog - 9
- 15
Add resource limits for diffing server
#154 opened by Mr0grog - 3
Infer local file type from extension
#231 opened by danielballan - 6
Would we want a xml differ?
#183 opened by lightandluck - 5
Content is getting cut off in the HTML diff
#259 opened by Mr0grog - 3
- 3
Add support for Sentry releases
#194 opened by Mr0grog - 4
Refactor on top of html5-parser
#138 opened by danielballan - 6
Choose better colors for highlighting changes
#158 opened by Mr0grog - 4
- 5
Package all the dependencies for conda.
#220 opened by danielballan - 4
Capture seed generation script
#204 opened by danielballan - 4
upload web-monitoring-processing into PyPI
#168 opened by weatherpattern - 4
- 1
- 6
- 2
Add health check endpoint to diff server
#262 opened by Mr0grog - 2
Add version endpoint to diff server
#256 opened by Mr0grog - 6
Create health check for Internet Archive data
#125 opened by Mr0grog - 1
- 6
Fix issues with zombie processes on diffing server
#207 opened by Mr0grog - 0
Add contributors section to readme
#235 opened by Mr0grog - 6
- 10
Can/should we be more lenient about decoding errors?
#203 opened by Mr0grog - 3
Investigate bug with beautifulsoup4 v4.6.2
#213 opened by danielballan - 2
- 2
Update to Circle 2.0.
#221 opened by danielballan - 2
Make tests runnable without credentials.
#222 opened by danielballan - 1
Do we need version information for dependencies?
#195 opened by Mr0grog - 0
Text of link diff can include non-visible content (e.g. the source of a <script>)
#198 opened by Mr0grog - 0
New links diff allows columns to get too wide/narrow
#199 opened by Mr0grog - 1
Discerning order of additions and removals of links next to each other is hard.
#151 opened by weatherpattern - 4
- 2
Upgrade CircleCI to v2
#165 opened by Mr0grog - 5
- 1
Add issue template to processing repo
#163 opened by weatherpattern - 0
Change `site`/`agency` to `tags`/`maintainers`
#166 opened by Mr0grog - 2
- 2
Diffing a version with no content results in an error
#157 opened by Mr0grog - 1
- 1
html_token: SVGs that have not changed look like they have and it’s a real problem
#146 opened by Mr0grog - 2
- 2
- 2
- 6
- 8
Add a `change_count` field in diff output indicating how many changes there were
#130 opened by Mr0grog - 0
Clean up differ names and responses
#132 opened by danielballan - 0
Ensure diffing server errors are JSON
#121 opened by Mr0grog