egg-mode-rs/egg-mode

Serialize for Tweet?

Closed this issue · 5 comments

It seems for Tweet only Deserialize is implemented and not Serialize
https://github.com/egg-mode-rs/egg-mode/blob/master/src/tweet/mod.rs#L236

Could you outline your use-case? It would not be possible to serialize the tweet in the exact form that it is received from the twitter servers as it is a lossy deserialization.

If you just want to dump the tweet e.g. to JSON for storage, I agree Serialize would be useful. Though, it might be better to create your own type that only plucks out the values you need to store.

Yes, storing them in a json format is the use case. Copying the values in a struct is what I'm doing currently, but it would be nice to have a default out of the box, since I guess this use case is quite common.

I'm hesitant about just deriving Serialize and calling it a day, for the thing @adwhit mentioned - the Serialize and Deserialize implementations wouldn't be symmetric. If you wanted to load the JSON back up into a Tweet instance, there would need to be another function to just load the fields up. The biggest thing that would cause a problem is the fact that egg-mode converts the text ranges in things like entities to be byte offsets instead of codepoint offsets like what Twitter returns. There would be no way to determine that this conversion had taken place without having some out-of-band indication, like a dummy field we put into Tweet that signals that it came from serializing a Tweet instead of being a raw JSON response.

...although now that i type that, that does seem like a solution. It would make the deserializer for Tweets much more complicated, since we'd need to make sure that RawTweet could handle both the response from Twitter and a round-trip serialization by writing an egg-mode Tweet out. (That, or read the whole thing to a serde_json::Value first, check for the dummy field, then either go through RawTweet or rip everything out manually depending on whether it's there. It would add the overhead of reading out the JSON first, but it would also be much easier to understand from a code-reading perspective.)

I'd also like to have a Serialize implementation (use-case: caching tweets to avoid hitting API limits too much). I'd be happy to contribute a PR if there was an acceptable design for the implementation.

I had an idea on how to do this somewhat efficiently, and posted PR #99 with the implementation. I asked around and people pointed me to #[serde(untagged)], which enables you to load/save an enum solely by the contents of its variants, instead of a tag value. This seemed to work perfectly, with the drawback that error messages become opaque. Does this work for y'all?