yaqwsx/jlcparts

Avoiding blacklist

eiffel31 opened this issue · 3 comments

In ui.py, download is made with 10 parallel connections running at maximum speed.

with Pool(processes=10) as pool:
    for i, (component, extra) in enumerate(pool.imap_unordered(fetchLcscData, componentsToFetch)):
        lcsc = component['lcsc']
        print(f"  {lcsc} fetched. {((i+1) / len(missing) * 100):.2f} %")
        component["extra"] = extra
        component["extraTimestamp"] = int(time.time())
        lib.addComponent(component)

This may have been considered as aggressive downloading and be visible on their server performance.

I would suggest a much lighter download:

  • sequential access
  • 200ms delay between 2 fetches

Sure it will take much more time, but we are not in a rush for a daily update.

Note that error message in log changed from "Forbidden" to "Too many requests".
I think that their first protection was to blacklist the address heavily requesting => forbidden. Some days later they improved their safety mechanism while adding a parallel connection limit (configured at 1?) or a requests/s limit (probably harder to implement) => too many requests

So gentle downloading may work fine.

These are just guesses...

  • The difference between Forbidden and Too many requests is only if we send user-agent identifying as a web browser.
  • Downloading sequentially changes nothing - the IP range is blocked
  • Scraping from another location works, but we risk getting blocked again.

I am communicating with JLC PCB and LCSC, however, it will take some time.

Resolved in 0d85706