haxsaw/hikaru

Edge cases with diffs

Closed this issue · 9 comments

aantn commented

Hey! I'm diffing two objects and getting a Type mismatch when the field is None in one object and has a value in the other. I think it would make more sense if this was a Value mismatch and not a type mismatch.

Example:

[DiffDetail(cls=<class 'hikaru.model.v1.ObjectMeta'>, attrname='deletionGracePeriodSeconds', path=['metadata', 'deletionGracePeriodSeconds'], report="Type mismatch:self.deletionGracePeriodSeconds is a <class 'int'> but other's is a <class 'NoneType'>"),

DiffDetail(cls=<class 'hikaru.model.v1.ObjectMeta'>, attrname='deletionTimestamp', path=['metadata', 'deletionTimestamp'], report="Type mismatch:self.deletionTimestamp is a <class 'str'> but other's is a <class 'NoneType'>"),

Any thoughts?

aantn commented

To follow up on this, I'm interested in getting slightly more structured data in the diff so that I can format the report myself. Would you be open to PRs that add additional fields to DiffDetail like diff_type, value, and other_value?

Hey, sorry for the delay in responding; I was expecting Github to notify me when someone submitted an issue. I'd be open to your suggestions; my goal is to make things that make it easier to work with Kubernetes. Would you please also be able to provide docs updates as well?

As for your original issue, it sounds like a decent suggestion and I'll look into it. At the moment, I'm kind of hip-deep in getting Hikaru to talk directly to the Kubernetes Python client (so you don't have to ;-), and that's got it's own challenges.

I'm wondering if this should be split into two tickets; or do you propose to fix the original issues with the changes you're suggesting? Also, showing the actual values could be a problem, as they can be elaborate structures that will get rendered automatically if you try to print them, which doesn't sound like a good idea. The path bit is there so you can easily find the elements to see the values. I think there needs to be some thought on displaying values.

So I've been thinking a bit more about type vs value mismatch in diff, and as I consider it I'm thinking that type is right if one value is None. Value should be the kind of mismatch if both items have the same type but their values don't match, whereas None vs another type, regardless of value, seems more like a type mismatch, as the types are comparable. I'm open to being convinced, but that's how I'm seeing it. I think that's independent of introducing more structure into the output.

aantn commented

I opened a PR so we have some code to kickstart the discussion.

Philosophically, you're right about the type vs value mismatch, but it in practice, it is useful to treat None differently for identifying fields that were added/removed.

To make this conversation more concrete, I'm attaching an example of how I'm formating Hikaru diffs right now when I send them to Slack:
image

This is the code I'm using:

def get_field_full_path(diff_detail: DiffDetail):
    return ".".join(map(str, diff_detail.path))


def babysitter_should_include_diff(diff_detail: DiffDetail, config: DeploymentBabysitterConfig):
    full_path = get_field_full_path(diff_detail)
    return any(substring in full_path for substring in config.fields_to_monitor)


def babysitter_get_attachment_blocks_for_slack(diffs: List[DiffDetail]):
    num_additions = len([d for d in diffs if d.diff_type == DiffType.ATTRIBUTE_ADDED ])
    num_subtractions = len([d for d in diffs if d.diff_type == DiffType.ATTRIBUTE_REMOVED])
    num_modifications = len(diffs) - num_additions - num_subtractions
    blocks = [
        {
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": f"{num_additions} fields added. {num_subtractions} fields removed. {num_modifications} fields changed"
            }
        },
    ]
    for d in diffs:
        blocks.extend([{"type": "divider"},
                       {"type": "section", "text": {
                           "type": "mrkdwn",
                           "text": f"*{get_field_full_path(d)}*: {d.other_value} :arrow_right: {d.value}",
                       }}])
    logging.debug(f"blocks are {blocks}")
    return blocks
aantn commented

Btw, any ETA on the the stuff with the Kubernetes python client? Are you planning to cover only CRUD operations or also stuff like PodExec?

We haven't done anything elegant regarding CRUD, but we do have a solution we're happy with for non-CRUD operations like PodExec. We're basically autogenerating subclasses of hikaru classes and adding additional base classes like PodExecMixin which are handwriten and not autogenerated. That lets us benefit from all of Hikaru's autogeneration but also provide the user of our own library with easy to use methods like pod.exec() which call into the official Python client. Is this something you would be interested in including in Hikaru itself or is it too far off the use cases you have in mind?

Re: the diff reporting, I see what's going on, but I would encourage you to do something to limit the length of a value you report. If someone adds/deletes an entire elaborate sub-structure your output will go from being easy to read to unreadable, as an entire hierarchy of dataclasses may get rendered to slack. Do you have an explicit DiffType.ATTRIBUTE_CHANGED? I had considered including the different values in the DiffType but decided against it since the path led you the values if you cared, but your example does show how you might want to just report diffs without access to the original objects being diff'ed. I'll look at the PR when I get a moment.

As for the other stuff, have a look at the email I sent.

aantn commented

Yes, there is a DiffType.VALUE_CHANGED and TYPE_CHANGED. The code is a little quick and dirty. I’ll clean it up more when I get a chance.

aantn commented

Forgot to include it in the code snippet I pasted above, but I’m filtering the fields I show so there is no risk of the output being larger than expected

aantn commented

Closing this now that the PR is merged.