scrape.py

Question

scrape.py

MarijnJABoer opened this issue 7 years ago · 2 comments

MarijnJABoer commented 7 years ago

Fix csv_update to fix broken URLs
Remove indet species before downloading (might add this in AW_to_json.py - UPDATE: indet species in the csv file will be skipped.)
- test AW_to_json.py for skipping indet species that are in the csv file.
Download in folders of p, d, h
Download in folders of species within shot type

Answer 1 · 2018-02-09T10:08:20.000Z

https://www.antweb.org/images/blf00976(40)-2/blf00976(40)-2_h_1_low.jpg

this link breaks when calling the API. It changes ( and ) to _

I've found that only blf and hjr catalognumbers do this.

Answer 2 · 2018-02-09T14:58:44.000Z

The script can now handle broken URLs --> prints these in the terminal.
The update function to fix broken URLs and fix catalog_numbers is working half. Out of 21 broken URLs, 15 could be repaired. I still don't have an idea what is wrong with the script (formicID/data_scraper/scrape.py