Make sure we test issue deduplication

Question

Make sure we test issue deduplication

Opened this issue 2 years ago · 3 comments

The latest database and I/O schema versions have added support for "issues" and "incidents". Out of those two "issues" required a potentially-disruptive change, because they have two columns in their primary keys: id and version. As such, the deduplication mechanism should handle them differently from other objects, i.e. allow keeping issues with the same id field value, but different version values in the database at the same time. This is different from the rest of the objects, which all can differ in their id field only.

Make sure we have tests in kcidb/test_db.py that:

load() two issues with the same id/version pair and check that only one ends up in the database (using either the query() or the dump() method);
load() two issues with the same id, but different version field values, and check that both are there in the database (using the same method);
load() two issues with the same version, but different id field values, and check the same;
load() two issues with both id and version different, and check the same.

Either prove tests like that already exist, or make sure they don't and add your own.

Answer 1 · 2023-03-19T04:26:38.000Z

hey, can I work on this?

Answer 2 · 2023-03-29T17:57:09.000Z

Sure!

Regarding your question on Slack:

In the issue should we be checking for these

issues=[
            dict(id="_:1", origin="_", version=1),
            dict(id="_:2", origin="_", version=1),
            dict(id="_:3", origin="_", version=1),
            dict(id="_:4", origin="_", version=1),
        ]

in particular? or a more general approach is needed to be taken?

I'm not sure what you're asking exactly, but the issues don't have to have any data beyond the required properties (which you have in the code above). This test would be about deduplication only.

if first is the case will something like this work?

client.load(dict(
        issues=[
            dict(id="_:1", origin="_", version=1),
            dict(id="_:1", origin="_", version=1),
        ],
    ))
    assert len(client.query('issue')) == 1

Yes, loading this data is enough for the first case. However, you have to plan how you're going to be separating data between particular test cases. You need to either empty the database between each, or only check for issues with particular IDs (as opposed to checking the complete contents).

Your querying call is incorrect, though. It should be something like assert len(client.query(ids=dict(issues=["_:1"]))["issues"]) == 1 to get the issues with a specific ID. You can also do assert len(client.dump()["issues"]) == 1 to get all issues (you could also search for an issue with a specific ID in whatever dump() returns). NOTE: I haven't tested either of these.

Answer 3 · 2023-03-29T18:18:09.000Z

I'm not sure what you're asking exactly, but the issues don't have to have any data beyond the required properties (which you have in the code above). This test would be about deduplication only.

this was the code in the file. I send it as a reference.
I wanted to ask if these are the issues that we are talking about.

or only check for issues with particular IDs (as opposed to checking the complete contents).

Is this will be something like this if we talk about particular IDs.

client.load(dict(
        issues=[
            dict(id="_:1", origin="_", version=1),
            dict(id="_:1", origin="_", version=1),
        ],
    ))
    assert len(client.query('issue')) == 1

client.load(dict(
       issues=[
           dict(id="_:2", origin="_", version=1),
           dict(id="_:2", origin="_", version=2),
       ],
   ))
   assert len(client.query('issue')) == 2  ignore the assert