AqlaSolutions/AqlaSerializer

Removing a SerializableMember Makes Deserializing Fail

ab-tools opened this issue · 22 comments

Everything is working really well with AqlaSerializer, I just met following problem now.

I have a serialized object (class attributes [SerializableType(ImplicitFields = ImplicitFieldsMode.None, EnumPassthru = true)]) with a data member like this:

        [SerializableMember(5)]
        public List<CustomClass> CustomClassObjects { get; set; }

Due to some restructoring I want to completely remove this class member and I thought this is no problem as long as I do not reuse the serializable member ID 5 again.

But as soon as I remove these two lines from the code and try to deserialize the object again I get an exception telling me that

A deferred key does not have a value yet (NoteObject call missed?)

What is the reason for that?
How can I remove a serializable member without breaking the serialization?

Thanks in advance
Andreas

First thanks for your quick reply as always! :-)

That's really strange as I can reproduce it every time here with my data object.

Can it make any difference that I use this attribute on the type model?
((AutoAddStrategy)RuntimeTypeModel.Default.AutoAddStrategy).UseLegacyTupleFields = true;

Thanks
Andreas

@ab-tools as long as your class is not a tuple (you are using an explicit SerializableType attribute so it's handled as a normal class anyway) UseLegacyTupleFields won't affect anything.

Try to make a minimal reproducable sample otherwise it's hard to say what can cause this.

The overall database structure is much more complicated, of course, so I'm not sure what might cause the problem here.

But I was able to isolate all database related code and just send you a link to an example project via e-mail.

It would be great if you could take a look on it!
You can easily reproduce the problem with that project.

Thanks in advance
Andreas

@ab-tools I can't see your mail...

Hm, I sent it to your aqla.dev@gmail.com address where I sent already something to in the past.

Did you check the spam folder, too?

@ab-tools yep, I found it. I'll look at it but in a few days.

Great, thanks.

Please let me know as soon as you have an idea about that as with the next update (planned for Monday or Tuesday) I would like to get rid of this property. :-)

Thanks again for your great support!
Andreas

@ab-tools
I think now I can guess why this happens. When serializing reference it's written only at a first occurrence and all next occurrences have only key written. Your ProcedureSegnements contains Waypoints and strings. When traversing through your list, serializer may find such Waypoint or string object at a first time so it gets actually written inside list field. Now you remove the list so serializer doesn't know anymore about this field anything and skips it completely with all included Waypoint and string objects. Later it tries to load one of those Waypointor string objects from the cache but it doesn't exist because it was not read previously because the whole list field was skipped.

It looks like a design problem of connecting reference tracking with Protocol Buffers format where fields can be skipped. When improving the reference tracking mechanism I was focused on the mechanism itself and missed the point on how versioning issues are handled around this mechanism. I'm not sure what I can change to handle this case. May be you have an idea? I think about writing a list of all original object positions as a header so when deserializer finds unknown key it may seek to an original object position and read from it. But this seeking may be not possible in some cases (and header rewriting too) and positions list will require some space (though, it could contain deltas, not full positions). Note that deserializer can't read a field without knowing its type and serialization settings so it's impossible to read such field while skipping without a proper context, therefore it has to be read later when a concrete field retrieves a key of unknown object.

The workaround is just renaming your list property to something like "__removed" (you can make it even private and put to a partial class). The same way you can rename list element class but it can't be removed completely, though its non-reference tracked fields may be removed.

This renamed field doesn't need to be manually populated for serialization. The field is only required for reading of a previous version.

While "removing" fields is pretty simple the same problem occurs and can't be fixed like this when an old version tries to read a new version format when some unknown to the old version field was added.

@AqlaSolutions: First thanks a lot for your detailed reply!

It's totally fine that an old version is not compatible with a new one (I wouldn't expect that anyway :-) ), but it's a pity that removing no longer used fields is not possible anymore with a new version then. This was always great about the protobuffer stuff that you can easily change the objects (add and remove fields) as long as you don't re-use old IDs.

I will use the work-around then as suggested which is quite messy (I'm seeing already a lot "__deleted" classed lying around in the code), but at least it works this way.

Thanks again
Andreas

@ab-tools just to be sure that you understand me correctly: by "old version" and "new version" in my previous comment I mean your application, NOT AqlaSerializer v1 vs v2.

Hi guys!
I have similar issue with my project as well. And workarounds could do nothing as I don't have control over the data classes - they're are user-defined in my case...

@AqlaSolutions, I think you're really need to consider writing all the referenced objects as list in header, so the issue will not be possible as all referenced objects are read separately prior to deserializing everything else. I just hope there is a possibility to figure out how to do this properly by using current late-reference system.

Regards!

@AqlaSolutions: Yes, I understood that correctly.

The problem is that now a "new version" of my app cannot read an "old version" of the data structure anymore if I removed a field there.

Although I do have control over all objects in contrast to @aienabled I at least CAN work around that, but it's really messy to not only have no longer used (private) fields in the objects, but also "deleted" classes laying around...

@aienabled, late reference mode is not designed for this. It still actually keeps subtype information at "original object" position because an "empty" instance creation can't be postponed (would break immutable stuff and make some things much more complex). Also late reference fields are slower.

I described a possible solution above but it requires both read and write streams to be seekable and it's pretty hard to implement. I'm not sure if there is another more elegant one.

I could add a possibility to specify removed fields from code but it still doesn't help with a backward compatibility so anyway I have to solve this in full.

it's really messy to not only have no longer used (private) fields in the objects, but also "deleted" classes laying around...

@ab-tools, you can keep all partial classes just in a one "trash" file.

@AqlaSolutions, I see. Perhaps you could use a separate buffer stream to serialize everything into and prepare the header with objects list, then write the header together with the serialized objects to the output stream. Of course it will use twice as much memory during serialization, but it will allow to solve the issue once and for all.
I think it could be made optional and enabled by-default.

As @ab-tools said:

This was always great about the protobuffer stuff that you can easily change the objects (add and remove fields) as long as you don't re-use old IDs.

Current design is very unreliable in this aspect.

Regards!

@ab-tools @aienabled can you confirm the fix? Please download the latest release. You need to re-serialize with RC2 to check it.

@ab-tools, please check that you can deserialize the RC1 format in the RC2 (but RC1 won't read back what you write with RC2). Let me know if you see performance or output size problems.

@AqlaSolutions: Looks very good in a first test:
I've removed the property from the code and the read with the RC2 DLL the old RC1 format without problems.

Then I saved the database again which worked fine as well resulting in about 5 % size increase (which is no problem for me).
By the way, just out of interest: Why this new format was needed at all as reading the old format works fine now although fields were removed?

After I've saved the database with RC2 again, I restarted the program and now got following exception:
"Expected max 1000000 elements but found 2910654"

Is there some kind of objects limit now?

@ab-tools yes, you can adjust the limit with model.ReferenceVersioningSeekingObjectsListLimit , it's there just for security reasons so no one can allocate too much memory when it's not expected.

Why this new format was needed at all as reading the old format works fine now although fields were removed?

The new format keeps object positions list so reference tracking works correctly when reading with removed fields. If you have an exception in RC1 when trying to read with removed field then you should not be able to read it with RC2 (KeyNotFoundException) too unless you re-serialize it in RC2 format. Have you removed the partial class?

Hm, you're right, for whatever reason I can now also read the database again with RC1 after removing the one field which led to problems the last time - really don't understand what's the difference is now...

But if you say I still cannot read a RC1 database after removing a field when I got an error with RC1, this update does not help me anyway as the program is out already and therefore I need to stay compatible with the already written RC1 databases.

So I will just stay with version RC1 now till I will somewhen have such a big database update that I need to force all customers to rebuild their database. This would be then a good opportunity to switch to the latest AqlaSerializer version. :-)

@ab-tools I don't really understand why you can't use RC2. If you have your V1 with RC1 released and then release V2 with RC2 - your customers don't need to convert things or something, they just can't load V2 database in V1 but since they already have V2 why would they try to read database with V1?

Ah, I see. If you start removing fields in V3 you would need your customers to first update to V2, rebuild database and then update to V3 otherwise V3 with removed fields won't read correctly RC1 format.

Exactly, there would be no problem in using RC2 now, of course, it would just have no advantage for me at all as I still cannot remove any fields. ;-)

As updating a component to a new version always has a minor risk in causing other yet unknown problems, I don't like to do that now if it does not have any advantages.

@AqlaSolutions, it seems to work perfect. I needed to rewrite my external known objects provider (for NetObjectsCache), but now it works fine together. Great job!
Regards!