This is a quick script to capture a JSON file of all search results for a given search term, along with the best version of each image in the search.
The JSON file contains as much meta data as I can get from the raw API, more can be got by calling the API for each photo (rather than each page of results), but that would put a lot of stress on the server and might get your key/IP banned.
Images are saved to ./images/search_term/
The script is used by typing npm run start "search_term"
Options:
- Sandbox: add
sandbox=false
to disable sandboxing of Chrome when retrieving an API key. Useful if you are running on a headless Linux box and don't want to set up sandboxing as described here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md - Start: add
start=
followed by a unix timestamp (in seconds). Useful for resuming a failed/incomplete scrape. - End: add
end=
followed by a unix timestamp (in seconds).
The script will automatically retrieve an API key, and renew it mid-job if it expires.
The resulting JSON file needs to have a [ added at the start and ] at the end manually.
parents.json is a sample output.
NOTE: This is largely untested, and I can't guarantee Flickr won't rate limit/ban you. It also probably violates their terms of service.