invana/crawlerflow
Web Crawlers orchestration framework that lets you create datasets from multiple web sources using yaml configurations.
Python
Issues
- 0
make `context` in spider to `job_context`
#13 opened by rrmerugu - 0
Writing a dry run test case mandatory.
#12 opened by rrmerugu - 0
allowed_domains for traversal
#10 opened by rrmerugu - 0
use regex for traversals
#11 opened by rrmerugu - 1
add downloaders to support websites built on frameworks like angular/reactjs/vue js
#6 opened by rrmerugu - 0
- 0
add metatags extractor in the extractors
#8 opened by rrmerugu - 0
- 0
integrate travis
#5 opened by rrmerugu - 0
documentation
#4 opened by rrmerugu - 0
add expiry of the cache to the feed crawling
#3 opened by rrmerugu - 0
determine the url type ?
#2 opened by rrmerugu - 0
write unit tests
#1 opened by rrmerugu