sparky8512/starlink-grpc-tools

Type hints for `status_data` returns

boswelja opened this issue · 9 comments

So off the top of my head, there are two approaches to this we can take.

First (and my preferred) is to create a dataclass for attributes from status_data, so we would return a Tuple[StatusData, ObstructionDetails, Alerts]. This would be a breaking change, as references such as groups[0]["id"] become groups[0].id. I prefer this one because it's slightly less work to reference variables 😆

Second is to create a class extending TypedDict. Referencing variables will look the same as it is now, but groups[0]["id"] would return str rather than Any.

Both of these require creating new classes, and both mean we can deprecate both status_field_names and status_field_types, since this information is present in the return type

I'll have to think this over some, but I'm starting to warm up to using TypedDict for this, with the following caveats:

My original design intent with this was that those return values be fairly opaque and dynamic. status_field_names and status_field_types only came about because CSV and sqlite output need schema information in advance of the actual data. That being said, I don't really care about "opaque" so much as I care about being able to add new fields in starlink_grpc without having to tweak the dish_* calling scripts, and that does not preclude specifying the type information more strongly.

As for interface breaking changes, there is already at least one other project (starlink-graph) using the starlink_grpc.py interface, so it would have to be a pretty compelling need to make an incompatible change. I'd be reluctant to go that route for this.

An issue with using TypedDict is that it was only introduced in Python 3.8, so that would raise the minimum Python version from the current 3.7. That would not be the end of the world, but I would like to consider alternative for users who don't care so much about the type hinting. (Not saying I would require it, just that I would like for it to be considered.)

Something to consider is that the alert detail data group's fields are determined at run time based on the reflected grpc data structure. I don't think that would work well with either dataclass or TypedDict. That being said, all the fields in that group are bools, so you could probably just do Dict[str, bool] for that one. At any rate, I believe this means status_field_names and status_field_types will still be required, at least in part.

Finally, however it is implemented, note that some of the fields may return None for value, so should be considered optional. In practice, only a handful of fields currently have that potential, but if the grpc protocol changes in the future to drop some information, as has happened in the past, more fields could do so.

That's a good point on the alert types! Is there a reason it's done that way? Do alert types change frequently?

Not frequently, but it does happen on occasion. I just felt it was dumb to have to do a code change every time a new alert was added when the info is right there in the grpc message definition.

It's still not exactly perfect now, as it will only pick up new alert bits when the script restarts and it re-fetches the protocol module definitions via reflection, but I figured that was good enough.

By the way: I've been poking around with how to support this with TypedDict without breaking Python 3.7 support or having to duplicate the type data between type hints and the *_field_names / *_field_types functions. It's a bit more complex than it really should have been due to some decisions I made with the field names, but I think it will work OK. I can throw that up in a branch if you want to have a look.

If you have comments, please drop them onto PR #60. If that change looks OK, I will do the same for the history stats and bulk history return types.

dataclass is available in all target versions right? I noticed get_status returns an object that could be replaced with dataclass. Might be a good stand-in for status_data
(sorry for disappearing for a while, I was sick)

get_status returns a spacex.api.device.device_pb2.DishGetStatusResponse object.

I think I can still make TypedDict work for the status_data (and similar) return values, I just need to make the type hint types be the source of truth and derive the data needed for the *_field_names / *_field_types functions from those, instead of the other way around. The only hesitation I have about doing so is that the way to access the type information from the type hint type constructs involves reaching into the object innards and is not actually documented, so may actually differ across Python implementations. (turns out there is a way to get it in a documented way, at least for Python 3.8+)

Second attempt has been posted as PR #62.

This time I ran it through mypy, and it checks out as long as I specify Python 3.9 (vs it spewing errors on 3.7, the only other version I checked). mypy does report a few other errors, due to it not being able to infer correct types for locals in a few functions, but those are unrelated to this change.

This is now present in the main branch.

One thing I should point out is that I only marked the fields that currently have the potential to return None as Optional[]. However, in the event that the underlying data is removed from the grpc protocol, really any of the fields could return None in the future, and I would not consider that a compatibility breaking change. One could argue that all the fields should thus be marked Optional[], but I felt that doing so would unnecessarily complicate the calling code with the need to check for None everywhere. That being said, the dish_*.py scripts do allow for any fields to be None, with a few exceptions, so I could be convinced otherwise.

If the protocol does change to remove things from the grpc structures, that's going to break starlink_grpc.py and will require a code change, anyway, at which point, the field can be changed in the type hint to be Optional[] (not to None, because doing so would break schema for any database that already has older data). Of course, any functionality that was relying on that data will break, regardless, but there's not much to be done about that. This has happened in the past, but it has been rare, and they usually stop populating the data first, which results in default values being reported.

Agreed, I don't think it makes sense to mark everything as optional. Schema changes should prompt a library update.
I plan on checking this out over the weekend, we can probably close this as long as mypy passes and I'll open up issues for anything else I find