A faster approach for fast crawler!
black-fractal opened this issue · 0 comments
black-fractal commented
-
Is your feature request related to a problem? Please describe.
1- Every time fast crawler run, it open all JSON files for gathering historical information about repetitive crawling!
2- Many JSON files would be merged! only the longest chan should be left and the others would be eliminated! -
Describe the solution you'd like
1- A new python script or a new function in fast crawler should be written for more multiple runs!
2- in functiontraverse_link
orcontinue_crawl
orsearch_in_files_history
new path should be stored in new data structure if:- The new fetched link is repetitive
- The new path has longer chain!