xirtah/gopa-spider
A spider written in Golang that uses Elasticsearch as the indexing mechanism. You can see the spider in use at http://xirtah.com.
GoNOASSERTION
Issues
- 0
Admin UI not showing redirected tasks
#22 opened by LzrBear - 2
- 0
- 0
Memory leak in the parse_pdf joint
#18 opened by LzrBear - 2
Diskqueues don't allow for horizontal scalability
#17 opened by LzrBear - 0
Separate the admin ui from the spider
#16 opened by LzrBear - 0
- 0
- 1
Add Joint to parse and index pdfs
#9 opened by LzrBear - 2
When a page has a link on it with auto-generated url params going to itself the spider gets stuck in a never ending loop
#11 opened by LzrBear - 0
Draw out pipeline end to end
#13 opened by LzrBear - 0
Adding new joints doesn't update existing parsed urls due to the snapshot not changing
#12 opened by LzrBear - 0
update to use a jenkins file for the build server
#10 opened by LzrBear - 1
split up framework, spider, and UI
#2 opened by LzrBear - 2
snapshots currently not working
#3 opened by LzrBear - 1
- 1
Named entity recognition joint configurable parameter URL_coreNLP not being adhered to
#6 opened by LzrBear - 0