Support for the v2 POST API
Mr0grog opened this issue · 11 comments
Do you have any interest in supporting the v2 POST API? It requires authentication (use your “S3-like API credentials” from https://archive.org/account/s3.php), but has a lot of super useful features.
I’ve been poking at it a fair amount lately, and would be happy to try and help add support for it here if you’re interested.
(Update: looks like they are concerned about extra load on that API under the current situation, so I dropped the docs link.)
@Mr0grog Would appreciate if you could tell me the difference between v1 and v2 POST api besides the outlinks?. Can't find the api docs.
Can't find the api docs.
@vegetableman see above comment — docs are not yet public, so I had to remove the link. I got ahead of myself here and thought they were ready for broader use after this blog article: https://blog.archive.org/2019/10/23/the-wayback-machines-save-page-now-is-new-and-improved/
Would appreciate if you could tell me the difference between v1 and v2 POST api
The POST-based API has some pretty fancy features. You can:
- Poll for completion or just make the request to save and forget it
- Save a screenshot
- Save error pages (normally it only saves 2xx responses)
- Set a time limit
- Set cookies to use when requesting the page to save
- Set basic auth credentials to use when requesting the page to save
- Get a huge amount of useful metadata about the saved result
@Mr0grog Thanks Rob for sharing the details 👍. I was able to dig some details on the new api through devtools on the spn page, specifically, save/status/<jobId>
that fetches the save completion status through jobId
.
Although I do get the new snapshot details for a url, the issue is, even after the completion of the job, the new snapshot details are still not readily available for the url through this api: https://archive.org/wayback/available?url=<url>
It takes about ~ 10 - 30 minutes or more for the details to be available.
Hmmmm, my experience has been that the availability time is about the same as an old-style GET request to SPN. (But I’ve been using the CDX API, not the availability API.) Are you sure it’s not just that SPN has been under very heavy load since the current coronavirus situation started?
Alright. So, the CDX api /search/cdx?url=
is what I should have been using. Unlike the available
api, I am getting the new snapshot details immediately through it. Thanks Rob 👍🙂.
I don't think the virus situation has anything to do with this. Also, I don't think SPN is at fault. My guess is, the data source for the availability
api is updated through a queue of some sort. Where as, the data source for CDX
is updated immediately .
I'd be open to including such a thing, but I'm clearly pretty far behind you on the learning curve. If you had time to prepare a pull request I'd be open to it.
Like I said in July, I'd take a pull request if someone had one. In the meantime, I'm going to close this ticket as stale.