scrape.py
MarijnJABoer opened this issue · 2 comments
MarijnJABoer commented
- Fix csv_update to fix broken URLs
- Remove indet species before downloading (might add this in
AW_to_json.py
- UPDATE: indet species in the csv file will be skipped.)- test
AW_to_json.py
for skipping indet species that are in the csv file.
- test
- Download in folders of p, d, h
- Download in folders of species within shot type
MarijnJABoer commented
https://www.antweb.org/images/blf00976(40)-2/blf00976(40)-2_h_1_low.jpg
this link breaks when calling the API. It changes (
and )
to _
I've found that only blf
and hjr
catalognumbers do this.
MarijnJABoer commented
The script can now handle broken URLs --> prints these in the terminal.
The update function to fix broken URLs and fix catalog_numbers is working half. Out of 21 broken URLs, 15 could be repaired. I still don't have an idea what is wrong with the script (formicID/data_scraper/scrape.py