Support for self referencing types.
allComputableThings opened this issue · 10 comments
Forked from: #233
Thrift accepts both:
struct A {
1: list<A> children;
}
and
struct A {
1: A a;
}
NB. The second result in some snafus with C++ unless you add " 1: A & a; ", do prevent the sizeof(A) being infinity.
@dain wrote (from previous thread)...
I think it is a bad idea to make ThriftType mutable. Instead, I would do something like this (pseudo code)
public ThriftMetadata(Class c) {
this(c, new Map<Class, ThriftMetadata> inprogress);
}
private ThriftMetadata(Class c, Map<Class, ThriftMetadata> inprogress) {
inprogress.put(c, this);
for each field:
ThriftMetadata metadata = inprogress.get(field.type)
if (metadata == null) {
metadata = new ThriftMetadata(field.type, inprogress);
}
// use metadata, but be careful to not access uninitialized fields
}
This is super dangerous, but works because we build the transitive closure in a single shot. The is really only borderline-sane because this is a single class with a private constructor that understands the exact publishing semantics of every field.
I think the bigger issue is how you serialize and deserialize the value.
In fbthrift (https://github.com/facebook/fbthrift) there is C++ support for this, with an annotation cpp.ref = true which tells thrift to generate the contained type as a reference. Not sure if it works on lists though, I'll have to check into this.
I would assume this must be in the IDL, since the engine needs to know where to break a complex graph. For example, if you have A -> B -> C -> D -> A ->
, the engine has no idea which arrow to replace with a reference. If that is the case, maybe we should add a reference
attribute to the @ThriftField
annotation.
@dain Say we're swifting a Java class equivalent of:
struct A {
1: list<A> children;
}
.. and that we are executing your new ThriftMetadata(A.class, inprogress)
, and A contains list<A>
as a field, then how does the ThiftType for A (required to build the ThriftType for list<A>
) get constructed? It always has to be the case that the ThiftType for A is not yet complete.
So... running with your idea... How about this.
class ThriftCatalog{
private ThreadLocal< Map<Type, ThriftType> > inProgress = ... ;
boolean isInProgress(Type javaType) { return inProgress.contains(javaType)' }
ThriftType declareInProgress(Type javaType, ThriftType t ) { ..add to inProgress; }
ThriftType getThriftType(Type javaType) {
.. Existing stuff
// Replace call to getThriftTypeUncached() :
if ( inProgress.get().hasKey(javaType) ) return inProgress.get. get(javaType)
int startSize = inProgress.get().size()
try{
t = getThriftTypeUncached();
} catch( Throwable e ) {
inProgress.clear();
throw e
} finally{
if (startSize==0 && inProgress.get().size>0 ) {
commit( inProgress.get() );
inProgress.clear();
}
}
}
}
Now we need to do:
class ThriftStructMetadata{
public ThriftStructMetadata(Class c, ThriftCatalog catalog) {
// Register a ThriftType with an incomplete ThriftMetadata
ThriftType selfTType = ThriftType.struct(this);
catalog.declareInProgress( selfTType );
...
Current implement of ThriftStructMetaDataBuilder moved here (or better, yet, in a subclasses: ThriftStructMetadataJavaBuilderImpl, ThriftStructMetadataScalaBuilderImpl)
... construction of fields now makes calls to catalog.getThriftType(), finds selfTType, prevents recursion.
}
}
I'd favor this refactoring, partly because the current ThriftStructMetadataBuilder assumes too much about Java -- its been very hard to adapt for scala. The above would clean things up considerably.
(updated formatting of previous post)
Let me start with, if a ThreadLocal
is necessary, this is a bad idea.
Before we go about discussing the metadata, can you describe how the serialization and deserialization will work? Once, I understand that, there might be a simpler solution to the metadata problem.
Specifically, how do we deal with circuits I described above. It it requires explicit tagging in the IDL, then the metadata problem is simple, since there really aren't circuits, just pointer, and we can explicitly model the pointers.
You only really need the explicit IDL tagging to break the graph for languages like C++, because normal nested objects are not by-ref in C++ thrift.
Serialization and deserialization work as usual. For example in deserialization, eventually you get to a part of the object graph where there could have been a nested object to read in, but the field will just not be there to read. (In your example above, you are reading the nested A, inside A->B->C->D->A, and it just doesn't have a value for B in the input stream to read in, so everything works out).
Agree that a ThreadLocal doesn't seem like a good way to handle this.
@andrewcox what happens if you actually have a cycle in the data?
Re: serialization.
With what I'm thinking, if you try to persist:
(1) Type self reference:
A a = new A();
a.friend = new A(); // currently not supported, but allowed by thrift
serialize(a);
... then no problem. Serialization stops when a.friend=null is reached.
(2) Type self reference in a container:
A a = new A();
a.list.add(new A()); // currently not supported, but allowed by thrift
serialize(a);
... then no problem. Serialization stops when a.friend=null is reached.
(3) Object self reference
A a = new A();
a.selfRef = a; // excluded in thrift by design.
serialize(a);
... then you can expect to stack-overflow. (and so don't do it!)
I'm mostly interesting supporting 1 and 2 (type self references), not 3
(object self-references).
I'm totally happy with the above behavior (user beware of object
self-references) -- we've lived with it in jackson, and (for us), is never
been a big problem.
If you find this behavior too disturbing then perhaps the middle ground is
to allow/disallow this behavior with a flag:
bool allowTypeSelfReference
in ThriftCatalog.
No changes to the codec are required - although I think it does require
coding around the self-referencing issues in order to construct the codec.
NB.
Judging by this
http://stackoverflow.com/questions/15634101/apache-thrift-fails-generating-recursive-structs
,
I think that the & keyword in thrift does 1 and 2 (and is only needed when
targeting C/C++), but doesn't avoid the problems with (3). I think
supporting (3) is impossible with vanilla thrift as client side codec
decision logic is required decide whether to deserialize a new object or
use a reference to an already seen object. (Its easy to add corresponding
support on the server side by maintaining a map of seen objects during
serialization/deserialization, and using it to decide whether to read/write
and ID or an object -- but much more work to overhaul thrift to add
corresponding support on each client).
Also, the jackson library (by default) has the same behavior as 1,2 and 3,
above (reaching the same object twice leads to a stack overflow). Jackson
recently added JsonIdentityInfo
http://wiki.fasterxml.com/JacksonFeatureObjectIdentity which I think solves
this. Doing the same in thrift requires updating the client codec for each
language.
Re ThreadLocal. I'm also not a huge fan. However, ThreadLocal is already
how type self-references are prevented from causing infinite loops in
ThriftCatalog. If you're happy with the existing class, I'd say ThreadLocal
is by far this simplest and method.
I'd like to avoid passing around some temporary "in progress" state to
wherever ThriftCatalog.getThriftType may be called (messy, and requires
global changes).
I wonder if there's another way to hold the in-progress ThriftTypes in
ThriftCatalog?
Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed.