invana/crawlerflow

Web Crawlers orchestration framework that lets you create datasets from multiple web sources using yaml configurations.

Python

Issues

make `context` in spider to `job_context`
#13 opened 5 years ago by rrmerugu
0
Writing a dry run test case mandatory.
#12 opened 5 years ago by rrmerugu
0
allowed_domains for traversal
#10 opened 5 years ago by rrmerugu
0
use regex for traversals
#11 opened 5 years ago by rrmerugu
0
add downloaders to support websites built on frameworks like angular/reactjs/vue js
#6 opened 6 years ago by rrmerugu
1
make settings specific to individual crawlers instead of global settings
#9 opened 6 years ago by rrmerugu
0
add metatags extractor in the extractors
#8 opened 6 years ago by rrmerugu
0
specify data type to the extracted data in the data_selectors config
#7 opened 6 years ago by rrmerugu
0
integrate travis
#5 opened 7 years ago by rrmerugu
0
documentation
#4 opened 7 years ago by rrmerugu
0
add expiry of the cache to the feed crawling
#3 opened 7 years ago by rrmerugu
0
determine the url type ?
#2 opened 7 years ago by rrmerugu
0
write unit tests
#1 opened 7 years ago by rrmerugu
0