gabledata/recap

Gateway should cache schemas

Closed this issue · 4 comments

The gateway queries downstream systems every time a schema is requested. I think it should have an option to cache schemas. Users could then request a schema with ?ttl=millis or some equivalent. If the schema isn't yet cached or exceeds the TTL then the schema is re-fetched, cached, and returned. If it meets TTL requirements, the cached schema is returned. If no TTL is set, "latest" is assumed, and the schema is always fetched.

/cc @cpard

cpard commented

@criccomini I'm a little worried about the schema caching here. What if the requester ends up using invalid schema information because an update happened in the source and the cache wasn't invalidated on time?

My feeling is that caching is ok for stuff like statistics but when it comes to the schema, e.g. a column was dropped or a type was changed, there are cases where using the wrong data might cause real problems. WDYT?

SGTM. I thought you were requesting caching during our last call. If that's not the case, I'll drop this.

cpard commented

SGTM. I thought you were requesting caching during our last call. If that's not the case, I'll drop this.

Sorry for that @criccomini, what I wanted to say during the call is that depending on the use case of the gateway, caching might be a viable solution or not. From my PoV, it's not that important at least considering the implications of managing invalidation consistently enough.

K, sounds good. I agree on the inconsistency part. Plus the (forthcoming) /registry path will act as a schema store, which could be used as a cache if needed. Plus^2, adding in-memory caching in the future is pretty easy to do as well.