Bulk export of summary data for granules
Dorado1987A opened this issue · 2 comments
Hi,
I previously commented on a ticket last year under my other account - RogueSergeant - about the search function of the API.
It's working great.
However, I now need to "hydrate" a lot of the information that's returned in bulk. Currently, the only way I see this being possible, is by looping through every result and running a get call to each's summary link. This is clearly inefficient, and puts a lot of strain on your systems, that I don't want to cause!
An example:
https://api.govinfo.gov/packages/CHRG-112shrg68086/granules/CHRG-112shrg68086/summary
Is there any way to get a paginated response from the API by providing a list of these URLs, or Package/Granule IDs.
Thanks,
Dorado
I see that you are interested in Congressional hearings. What kind of information are you trying to get from the granule summaries?
From the search service response, you should be able to directly download the mods file, which includes metadata about the result in XML format.
If you set the resultLevel
to "package", you'll be able to get the package-level MODS that includes links and metadata associated with each granule in a single file, which may be more efficient for your purposes.
When you ask about returning a paginated list given a set of package or granule ids, do you mean you would like a paginated response that essentially contains an array with all of the summary information together?
That currently isn't possible and would likely be more taxing on our system to produce than to generate the individual summaries. From our perspective, we're fairly well equipped to deal with a large number of requests, so pulling individual summaries isn't an issue on our end at this time.
If we built functionality allowing you to specify specific fields to return in the search response, would that be helpful?
Hi, yes a function to add requested fields to the search result would be fantastic. I'm essentially trying to add abstracts and additional depth to the results returned and trying to do so as efficiently as possible!