black-fractal/wikipedia-philosophy-game

A faster approach for fast crawler!

black-fractal opened this issue · 0 comments

  • Is your feature request related to a problem? Please describe.
    1- Every time fast crawler run, it open all JSON files for gathering historical information about repetitive crawling!
    2- Many JSON files would be merged! only the longest chan should be left and the others would be eliminated!

  • Describe the solution you'd like
    1- A new python script or a new function in fast crawler should be written for more multiple runs!
    2- in function traverse_link or continue_crawl or search_in_files_history new path should be stored in new data structure if:

    • The new fetched link is repetitive
    • The new path has longer chain!