naturalis/FormicID

scrape.py

MarijnJABoer opened this issue · 2 comments

  • Fix csv_update to fix broken URLs
  • Remove indet species before downloading (might add this in AW_to_json.py - UPDATE: indet species in the csv file will be skipped.)
    • test AW_to_json.py for skipping indet species that are in the csv file.
  • Download in folders of p, d, h
  • Download in folders of species within shot type

https://www.antweb.org/images/blf00976(40)-2/blf00976(40)-2_h_1_low.jpg

this link breaks when calling the API. It changes ( and ) to _

I've found that only blf and hjr catalognumbers do this.

The script can now handle broken URLs --> prints these in the terminal.
The update function to fix broken URLs and fix catalog_numbers is working half. Out of 21 broken URLs, 15 could be repaired. I still don't have an idea what is wrong with the script (formicID/data_scraper/scrape.py