jstedfast/MailKit

Why does not UniqueId include the vadility in it's Equal implementation?

rklec opened this issue · 7 comments

Is your feature request related to a problem? Please describe.
This is how the comparison of UniqueId is implemented in MailKit:

MailKit/MailKit/UniqueId.cs

Lines 261 to 273 in c52fdac

/// <summary>
/// Determines whether the specified <see cref="System.Object"/> is equal to the current <see cref="MailKit.UniqueId"/>.
/// </summary>
/// <remarks>
/// Determines whether the specified <see cref="System.Object"/> is equal to the current <see cref="MailKit.UniqueId"/>.
/// </remarks>
/// <param name="obj">The <see cref="System.Object"/> to compare with the current <see cref="MailKit.UniqueId"/>.</param>
/// <returns><c>true</c> if the specified <see cref="System.Object"/> is equal to the current <see cref="MailKit.UniqueId"/>;
/// otherwise, <c>false</c>.</returns>
public override bool Equals (object obj)
{
return obj is UniqueId uid && uid.Id == Id;
}

MailKit/MailKit/UniqueId.cs

Lines 177 to 189 in c52fdac

/// <summary>
/// Determines whether two unique identifiers are equal.
/// </summary>
/// <remarks>
/// Determines whether two unique identifiers are equal.
/// </remarks>
/// <returns><c>true</c> if <paramref name="uid1"/> and <paramref name="uid2"/> are equal; otherwise, <c>false</c>.</returns>
/// <param name="uid1">The first unique id to compare.</param>
/// <param name="uid2">The second unique id to compare.</param>
public static bool operator == (UniqueId uid1, UniqueId uid2)
{
return uid1.Id == uid2.Id;
}

Aka it only compares the ID part.

Now RFC3501 2.3.1.1. clearly states that it is only unique in conjunction with the validity (and the mailbox name aka folder name in MailKit):

The combination of mailbox name, UIDVALIDITY, and UID must refer to a single immutable message on that server forever.

Describe the solution you'd like
We don't have the folder name, but could not we just do this? Aka compare both things, we can compare for uniqueness?

		public bool Equals (UniqueId other)
		{
			return other.Id == Id && other.Validity == Validity;
		}

Describe alternatives you've considered
Let's saw say drawbacks:

  • Yes, we cannot compare the folder name at this point, so the unique comparison will not be . That said, some C# doc comment could be added here to make the consumer aware of that, potentially linking to the relevant RFC part as it is a "must read". (That's in general a good idea.)
    After all, the current doc is:

    Determines whether two unique identifiers are equal.

  • yeah, i see how it breaks with the other comparisons like > or < etc. to break-apart logically, because you cannot really compare them if their UIDVALIDITY differs. What would be expected here hmm? Possibilities I see:
    • When the UIDVALIDITY differs, throw, as it is not comparable/sortable? (though AFAIK for operators in C# that is quite unexpected)
    • Just remove that feature altogether? After all UID's are to supposed to be compared like that, if I read the RFC correctly. Quoting the RFC they are not "necessarily contiguous", just that they are ascending - that said, if the UIDVALIDITY reset happens, they can begin again, so comparing only the ID part as currently also is a mood point?
      After all, the "Message Sequence Number Message" seems to be more the intended use case for any comparison for order etc. After all it states "message sequence numbers can [even] be used in mathematical calculations". If you use UIDs for that, IMHO, you're likely doing something wrong hmm?
    • always make the comparison false, in such a case or so?

Additional context

It doesn't compare the validity because it's expected that the developer has already handled cases where the validity has changed. It doesn't make sense to compare UIDs if the UIDVALIDITY has changed.

Also, keep in mind that new UniqueId (uint id) exists which means there is no validity value set. This exists for convenience when retrieving UIDs from a local (cache) DB, for example.

I suppose that the UniqueId equality implementation could check that each UID has a validity, and if they do, compare them, but I'm not sure if that really adds any value.

Can you explain why the current implementation does not meet your needs?

I understand that if you are looking at the implementation and comparing to the spec and expecting strict adherence of UID comparisons, then I can see where you are coming from, but I think if you look at it from a practicality standpoint, I think the current implementation makes the most sense.

Well my use case here is local (DB) saving of all mails in a (bulk) mail processing (see #1723 basically). As such I want to save these as one unique identifier to catch mails that have already been processed or were interrupted when doing so etc. And also log when (and what) was processed etc.

All in all, I need an identifier that stays more or less persistent.
I know, I will look into Message-ID soon too, also to add it as an additional ID for getting a potentially already processed message,

I know I could also use the server to set some "read" flags or move it into folders etc., which I partially also want to do, but as this affects the messages itself and is visible, this is no indicator I can use for my internal processing. Also, I need more flexibility, because the error handling is of course a possibility, i.e. that the message has not been fully processed.
In the end, I thus need to make this comparison manually, like this:

await db.Mails.FirstOrDefaultAsync(entity => entity.Uid == uniqueId.Id && 
                entity.UidValidity == uniqueId.Validity)             

I just thought having that built-in would make sense.
And my point was also that for devs who may not have read the spec as exactly as they should be, the current implementation leaves a "trap" for potential misimplementations. As such, even if you'd like to keep it like this, I would continue to argue to at least link the relevant RFC section, so people have a hint that they should really read that, if they use the UID and just depend on that and e.g. think it is unique (in a folder) and will never change. (This would be problematic in my use case/example, as it would mess it up totally, if you just compare the Uid without UidValidity.)

All in all, I need an identifier that stays more or less persistent.
I know, I will look into Message-ID soon too, also to add it as an additional ID for getting a potentially already processed message,

Message-IDs are not unique. Don't bother going down that road.

Also, I need more flexibility, because the error handling is of course a possibility, i.e. that the message has not been fully processed.
In the end, I thus need to make this comparison manually, like this:

If you need UniqueId's equality comparers to compare UIDVALIDITY values, then you are "Doing It Wrong(tm)".

Your database is structured incorrectly if each record in your database has a UIDVALIDITY value. They should not because it's not necessary.

You should store a single UIDVALIDITY value per folder in your database and when you open the remote IMAP folder and get the new UIDVALIDITY value, you should immediately compare it to the value in your database and clear all of the records for that folder if the UIDVALIDITY value has changed.

And my point was also that for devs who may not have read the spec as exactly as they should be, the current implementation leaves a "trap" for potential misimplementations.

Changing UniqueId.Equals/== to compare the validity values will not fix this theoretical problem for those developers. It would only cause more confusion.

Yeah, it's just one folder here, so I'd thought I took that shortcut, but yeah, you could compare it to the folder too. And sure, this can be optimized. As my DB here works as a log what has been processed, too, though, I cannot delete old records, record-keeping is a requirement. (Though yeah I know, usual logs are also written but cannot be evaluated like that easily.)
But we get off-topic...

Changing UniqueId.Equals/== to compare the validity values will not fix this theoretical problem for those developers. It would only cause more confusion.

That was not my point trying to argue here. My point was adding a C# doc comment hinting to that spec, so people don't stumble over it.

You might be interested to know that I've been working on some demo code on how to synchronize a local database/cache of IMessageSummary information with a server.

I had started writing some demo code ages ago that I had shared with some people (as the topic came up via email discussions), but I haven't publicly shared it yet, I don't think(?).

Anyway, from years ago (wow, 2020?):

This is obviously based on using SQLite as a backend storage format, but I'm reworking this to avoid using any SQL and instead using a custom binary format with a cleaner interface that I can perhaps add directly to MailKit (perhaps under a MailKit.Cache namespace?). I may also include a generic SQL implementation (which would likely need to be subclassed for SQLite vs MySQL vs MSSQL vs Postgres, etc) but would be a useful starting point for people.

Anyway, the important bit is the OpenAndResyncAsync() method in MessageCache.cs: https://gist.github.com/jstedfast/2ba9542bb4b045b5e1e6c4f3f5655620#file-messagecache-cs-L334

The current struggle is figuring out if/how this should be integrated into MailKit itself. In any event, I am likely going to be sharing this in the Discussions forum once I get something working (it's difficult to find enough free time to work on this lately).

Perhaps I will check it into a branch on GitHub or something.

Thanks a lot! Yeah, such a thing is certainly helpful, though maybe not 100%ly applicable to my use case. But it likely helps others! And if you'd mind my opinion, it would be great if it ever gets into MailKit (or being a separate MailKit.Cache nuget package or so...) IMHO, it would be great if it had the option to use different (storage) backends, i.e. I would prefer EntityFramework here e.g. This could then be easily integrated into the rest of the application.

Right, if I add this to MailKit, my plan would be to make it abstract enough that any backend could be used. The hard part right now is designing proper interfaces for this and figuring out how it should interact with ImapFolder.

API design is more difficult (usually) than the implementation. hah