Issues
- 0
example: version format is invalid
#236 opened by mingfang - 1
Update docs for `aleph_emit_document` operation
#232 opened by tillprochaska - 0
Is it possible to prerender pages before parsing?
#218 opened by ninoppp - 0
- 0
sqlachemy version
#214 opened by rewiaca - 1
Create an image which supports python 3.10
#208 opened by Rosencrantz - 0
Rename master branch to main
#211 opened by Rosencrantz - 0
nothing
#210 opened by Rosencrantz - 1
- 1
Memorious session information expiration
#203 opened by monneyboi - 3
Example won't run `no module named 'example'`
#201 opened by monneyboi - 1
Do some test stuff
#195 opened by Rosencrantz - 0
Possible improvements to how we test Memorious
#199 opened by sunu - 1
Ingesting multiple files from a single page into Aleph and creating ftm entities
#192 opened by uhhhuh - 0
`aleph_emit` fails with data validation error
#194 opened by uhhhuh - 1
Using the standard parse function for creating entities does not generate an entity_id
#187 opened by Rosencrantz - 1
Support for media monitoring
#153 opened by pudo - 0
documentcloud operation should parse `publisher` document metadata and `aleph_emit` should be able to push it to Aleph
#182 opened by sunu - 2
- 1
Reference documents from structured data scrapes
#65 opened by pudo - 2
- 1
- 1
Make sure crawler status actively updates periodically instead of relying on page reload
#75 opened by sunu - 4
Cannot find crawlers in list
#124 opened - 1
- 1
Include mapping inside memorious crawler
#60 opened by uhhhuh - 1
- 1
Implement search/filtering for scrapers UI
#64 opened by uhhhuh - 1
Scheduled crawler don't run.
#142 opened by hsnamIT - 1
- 0
Retry to establish database and redis connection a few times before raising an error
#181 opened by sunu - 1
Make memorious run a crawler based on a yaml file
#150 opened by sunu - 4
documentcloud integration may need to be reviewed
#168 opened by Rosencrantz - 0
Add file path information to dav_index
#152 opened by uhhhuh - 3
Indexing Atlassian Confluence
#154 opened by pudo - 0
Dependabot couldn't authenticate with https://pypi.python.org/simple/
#139 opened by dependabot-preview - 5
proxy handler for memorious
#125 opened - 3
fetch should ignore mailto links
#128 opened by rhiaro - 3
Normalized URLs and non-reproducibility
#94 opened by moreymat - 7
- 1
- 1
HelpWanted
#89 opened by nvelumani - 1
Frequent database deadlock errors
#77 opened by sunu - 2
- 3
`memorious run` command never finishes
#74 opened by alexmojaki - 5
Running a scraper in the example fails with an error when calling context.set_tag(tag, None)
#72 opened by alexmojaki - 0
Use the OCR service thorugh the ServiceLayer
#63 opened by uhhhuh - 4
Why is `cleanup` removed?
#70 opened by pohnean - 0
Move aleph_emit operation into alephclient
#66 opened by sunu - 0
Document the `nested db` operation
#62 opened by uhhhuh