shurcooL/graphql

Question: Example of doing pagination (especially on nested resources that are paginated)

rocktavious opened this issue · 6 comments

I'm working with a graphQL API that has a top level resource that has several fields in it that are paginated themselves - something like

{
  account {
    services(after: "", first: 10) {
      nodes {
        tools(after: "", first: 2) {
          nodes {
            category
          }
          pageInfo {
            endCursor
            hasNextPage
          }
        }
        tags(after: "", first: 2) {
          nodes {
            key
            value
          }
          pageInfo {
            endCursor
            hasNextPage
          }
        }
      }
      pageInfo {
        endCursor
        hasNextPage
      }
    }
  }
}

and its unclear to me how to properly handle pagination of these nested tools and tags resources. I understand how the pageInfo object's endCursor and hasNextPage fields work and how to use them with the argument to after to query the next page of data but its unclear to me the proper pattern to use in go code when using this graphql framework.

I would love if anyone has links to example code query-ing nested pagination or blog posts talking about how to do pagination with this framework.

Thanks! Love the framework so far!

This is a good question, and it remains an open question for me.

(FWIW, basic pagination is documented at https://github.com/shurcooL/githubv4#pagination, but it doesn't cover pagination across more than one resource that this issue is asking about.)

So far in the few places where I need to do pagination across more than one resource type per query, I've been able to get by with only doing pagination at the top level, where it's most important. For the inner levels, I just request the maximum single page size with a "TODO" to figure out a good approach, and that's getting me by for now (enough that this isn't a high priority for me). You can see an example here:

https://github.com/shurcooL/home/blob/164885982757a609c5a32d75bdb13288534023a3/internal/exp/service/change/githubapi/githubapi.go#L377-L379

It seems to me that it could be implemented by doing nested iteration. That is, paginate across all entries in the inner-most resource type first, then go to next higher level page, and repeat until all pages are visited. Something like:

for range top level pages {
	for range low level pages {
		// perform query
	}
}

If someone finds good example code, I think this would be worth documenting to save others the effort of figuring it out.

One more thought: this question is applicable to GraphQL as a whole, so perhaps people have found solutions for this in other languages that may be helpful to find.

@dmitshur thats really surprising to hear - seems like this would be a pretty typical thing with complex graphql APIs.

As it stands right now i'm thinking i'm going to have to make a choice between 2 options both of which require a 2nd API call that is JIT (just in time) at its usage:

Adding an additional function on the client where you pass in the top level resource you want to fill something like client.GetTools(*services[0])

OR

adding a function instead of using a field on that struct IE services[0].Tools() instead of services[0].Tools which would be a structure function that can then make that additional API call. But i think the really disgusting part about this is that the structure would also need a reference to the client to make the additional API call which would need to be held as a pointer on each of the returned structures or the user would end up needing to pass the client to the tools function like services[0].Tools(client) which doesn't feel great either.

Definitely seems like a rock and a hard place to implement nested pagination with this framework. /cry

If anyone stumble across this - this is how i've implemented it in our OpsLevel go client

  • i made structs that implement nodes and pageInfo data which the toplevel resource uses
  • I added "hydrate" functions on those structs
  • I then implement a "hydrate" function on the top level resource to call all the nested resources "hydrate" functions. Passing in the client that was used to make the initial API call
  • every "hydrate" call appends to the existing struct's node list so the final structure the end user gets has all the nested resource lists fully paginated out and has access to all the data. They just incur 1..N extra API calls depending on how many pages each resource that is paginated has.

@dmitshur is there any way a field can be included on query struct but ignored when this library generates the graphql query - similar to the json tag omitempty

Something like

type StuffQuery struct {
    StuffPages []StuffPage `graphql:"stuff"`
    Stuff []Stuff `graphql:",ignored"`
}

This would allow me to get the raw data back from the server into the field StuffPages but then use my pagination processing to handle pagination and then fill the Stuff field so that this is the field the enduser interacts with which has all the pageinfo, cursor, hasNextPage etc removed and they just get the final array of Stuff

After some digging it appears not - https://github.com/shurcooL/graphql/blob/master/internal/jsonutil/graphql.go#L257

seems like thats the area which would need to be refactored to support this. Seems like this would be a good feature to add to support writing easy to access structures of paginated data. Thoughts @dmitshur?

Following this thread on Twitter, I came up with the following:

https://gist.github.com/myitcv/f7da6cb073de3c48c9f88b6b95225b4c

The overall goal was to extract:

  • all discussions
  • all comments from those discussions
  • all replies to those comments

for a given GitHub repository, minimising the number of calls to the GitHub GraphQL endpoint as far as possible.

This felt like incredibly hard work, not to mention error-prone.

I would welcome any thoughts on a better approach.