gabledata/recap

Support Unknown Type

Opened this issue · 6 comments

There may be situations where there is a field that cannot be understood by the recap converters, or cannot be represented in Recap, or based on the particular source system the type simply is not determinable.

In such cases, it could be valuable to allow some kind of placeholder that acknowledges that the field/column exists by name, but without trying to represent the schema of that field. Adding an UnknownType to recap could do that.

Potential implementation:

class UnknownType(RecapType):
    """Represents an unknown or unsupported Recap type."""

    def __init__(self, description: str = "Unknown or unsupported type", **extra_attrs):
        super().__init__("unknown", **extra_attrs)
        self.description = description

    def __eq__(self, other):
        return super().__eq__(other) and self.description == other.description

    def __repr__(self):
        return f"{self.__class__.__name__}(description={self.description})"

Hmmm.. need to think on this one. There's also the concept of an AnyType, which is semantically slightly different, but is more in line with what most type systems have.

@mjperrone can you give a concrete example or two of real situations where Recap can't model the type? I think I need to understand more about the use case.

Curious if @cpard has any thoughts?

I can't think of any database types that can't be represented in recap. The only context I can imagine a type that Recap can't represent is Unknown in the TypeScript type system. I don't think that's a compelling reason to add UnknownType to recap though, as it doesn't exist in the runtime language JavaScript.

I think the best reason would be:

cannot be understood by the recap converters

to allow recap to be useful when the converters are only partially implemented.

to allow recap to be useful when the converters are only partially implemented.

This is the biggest use case I see as well. Right now many of the converters raise an error when they encounter a type they don't yet support (example from the JSONSchemaConverter). Given a large/complex JSON schema, a single use of an unsupported type prevents the remainder of the schema from being converted. I think it could be useful for converters to have an option to be less strict, in which case the supported schema is converted, any anything else is typed as UnknownType. Not sure if we want to get into the unknown/any type distinction though 🙂

Another option would be to just drop anything that can't be converted, but from a user experience perspective I think getting back an UnknownType would be less confusing than missing properties

Gotcha, this totally makes sense as a use case. I'm thinking through implementation approaches. I am hesitant to modify the type system for this; it feels more like a valid concern about the way converters are implemented.

I think what we want is some kind of strict=True param for the converters. The question is what to do when strict=False.

  1. One option would be to return a RecapType, and treat this as UnkownType.
  2. Another option, as you said, would be to create an UnknownType.
  3. Yet an other would be to modify the type fields to allow for RecapType | None, and return None for unknown types.
  4. We could use NullType to represent unknown types.

Any other ideas?

(1) is least invasive, but maybe a bit of an abuse of RecapType. It is actually more like AnyType right now.

(2) is most semantically correct. We do have some precedent for this style with ProxyType. If we went this route, I wouldn't want to add an unknown to the CST, but rather keep UnknownType as only an AST class similar to ProxyType.

(3) seems the most invasive and also a bit of an abuse of None. Probably asking for trouble with NPEs and such, too.

(4) also seems a bit funky to me since null is an actual type that can be expressed in many other type systems. It's also something we support in the CST.

For (2), can you guys think of precedent in other schema systems for this kind of model? I'd like to see what others do for this use case. Zed has the concept of an error type; this seems close.

Regardless of what we pick, we probably also want to propagate as much information about the underlying type as possible, such as the underlying type as a string, docs, etc.

@mjperrone @adrianisk K, let's move forward by adding UnknownType in types.py. The converters should all be updated to return UnknownType instead of raising an exception. I think we can skip the strict param in converters. If users wish to raise on unknown types, they can look for UnknownTypes in the returned schema and raise on their own.

Pull request welcome. ;)

#430
gabledata/recap-website#16

Still needs some work but here's a start