A faster approach for fast crawler!

Question

black-fractal opened this issue 4 years ago · 0 comments

Is your feature request related to a problem? Please describe.
1- Every time fast crawler run, it open all JSON files for gathering historical information about repetitive crawling!
2- Many JSON files would be merged! only the longest chan should be left and the others would be eliminated!
Describe the solution you'd like
1- A new python script or a new function in fast crawler should be written for more multiple runs!
2- in function traverse_link or continue_crawl or search_in_files_history new path should be stored in new data structure if:
- The new fetched link is repetitive
- The new path has longer chain!