tokenmill/crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
JavaNOASSERTION
Issues
- 0
- 0
Relative date should be the date of parsing
#19 opened by dainiusjocas - 0
Autocomplete on source field
#21 opened by dainiusjocas - 0
Topology Worker Increase Not Doing Crawling.
#45 opened by rishrockstar - 0
- 0
- 0
Setup a proper CI/CD
#40 opened by dainiusjocas - 0
Separate integration tests from unit tests
#39 opened by dainiusjocas - 0
- 0
- 1
tag cloud in kibana on text field
#32 opened by dainiusjocas - 0
Export some data by query
#31 opened by dainiusjocas - 0
- 1
Article fingerprint for deduplication
#26 opened by dainiusjocas - 0
Use canonical url for duplicate detection
#12 opened by zmedelis - 0
Search for tests
#24 opened by dainiusjocas - 1
- 0
Management UI should have paging for tests
#13 opened by dainiusjocas - 0
- 0
- 0
Evaluate http sources from CSV
#20 opened by dainiusjocas - 0
- 0
Bulk update updates docs with wrong url
#17 opened by dainiusjocas - 0
BulkProcessor and the RestHighLevelClient treats url encoded urls differently
#16 opened by dainiusjocas - 0
Management UI null pointer exception
#15 opened by dainiusjocas - 0
- 1
Management UI paging of HTTP sources
#9 opened by dainiusjocas - 1
Management UI search as you type
#8 opened by dainiusjocas - 0
- 0
Log actual LD+JSON on parsing error
#6 opened by dremeika - 0
- 1
- 0