chrysn/aiocoap

Better token management

Opened this issue · 4 comments

Currently, tokens are picked rather stupidly, and with fixed lengths.

Changing this would mean that preferably, short tokens are sent, but would also avoid the (currently unlikely but possible) case of tokens that are reused because the underlying request is long-lived and doesn't time out.

There's some general RFC guidance around on numeric identifiers that should be considered too.

For DTLS, tokens can't be reused (ERT introduces normative language here).

Note to self: EXCHANGE_LIFETIME is not directly relevant here as that's about MIDs.

Suggested mechanism:

  • Define a (configurable) "short" time. The token manager keeps an alternating clock with that short time as half-cycle time. (Token reuse density would be better with per-token keeping, but this way we don't have to keep track of when every single one was used). Each half-phase gets a discriminant on the token. (MSB would be illustrative, but LSB is practical because then tokens can be as short as possible). The time needs to be large enough to make lingering responses unlikely (something larger than EXCHANGE_LIFETIME might be a good default).
  • Every token manager keeps a byte length limit for "small" tokens (say, initially 2 bytes); when messages are sent faster than that allows, it can still be bumped.
  • Requests with a timeout of <= short time (where per-token timeouts generally need to be introduced, and a sane choice for "short time" may influence the default) get short tokens. Requests with larger timeouts (especially observations; for bounded-time nontraditional responses like multicast proxy, that time) pick a token larger than the "small", possibly randomly.
  • Whenever the "short" clock flips, the "small" token counter is set to zero (with the adequate discriminator).
  • Requests that can pick a small token take the next one in the current phase, they're not tracked but just incremented. If it overflows before the clock resets it, the "small" limit is increased (probably permanently). As thus non-small tokens can slip into the short range, picking a token needs to check against the active non-short tokens.
  • Non-small tokens (tokens that were created non-small, even if they are now under the short limit) stick around in the list for the "short" time after their use.

Tweaks:

  • If we don't want always-running timers, the clock can be stopped between ticking and the first token being taken.
    More generally, the clock can be arbitrarily stretched until before the tokens would wrap (or as long as the tokens don't make the number go up to more than the minimum length).
  • The check in the non-small token pool can be elided starting a half cycle after the last now-small-formerly-nonsmall token has been released, until the small size is incremented again.
  • To make tokens better and better from a "careful with numeric identifiers" PoV, we could...
    • Not reset the token number to 0 but to a random value for each tick of the clock. As both the "starting-from-zero" equivalent value and the "starting-from-random" value can be known, the first tokens can still be trimmed to down to the at least 1 byte as the number-from-zero has, if we want to allow 1:128 (clock bit could be known) guessing chances. (Or trim to whatever we can tolerate).
      That's equivalent to XOR-ing the token with a per-clock-halfcycle value.
    • To not increment by 1 but jump around wildly, the token could be encrypted with a per-halfcycle key. Conveniently, the token length is always smaller than an AES block. The discriminator bit would need to stay in the clear, for otherwise after each

Goals:

  • All operations should be at most O(log(n)) in time for n the number of non-small tokens still active.

Unsorted other points:

  • Group requests probably need to get non-small tokens all the time; the pool of non-small tokens might even be kept across all peers for simplicity.

  • This is all for the unconstrained aiocoap implementation. If a constrained implementation were to do something like that, it'd probably

    • fix the "small" boundary per application
    • just refuse to send more messages if inside the short time the small tokens are exhausted (it's probably vastly exceeding its message rate anyway)
    • use this mechanism globally for all addresses and transports

    Then, it can make do with one coarse timer (like RIOT's seconds ztimer), a counter for the small token number and non-short tokens allocated with slots by purpose.

I have something to add here, thought it doesn't address most of your thoughtful points

I've noticed that there is a commercial application using COAP that defaults to using a 4 byte token length. Not sure if that is useful input or not in your decision-making, but take it for whatever it may be worth

It would be nice (for some cases where very tight interoperability is desired) to be able to expose the token length via a public interface, so the application could specify the desired token length. You could of course be much fancier and provider an interface to handle some of the parameters/behavior you mentioned above too, I suppose. I realize doing either of these would be a very invasive change, so I'm not necessarily proposing them, but I will probably take a stab at supporting user-specified token lengths. I'm really only interested in adhering to some other application's arbitrary length choice, but figured it might be nice as a tunable?

Thanks for your work on this project by the way- I would be in a world of hurt right now without it ;)

Nice to read that aiocoap is helpful! :-)

If there is a peer that requires a particular token length, that peer is in severe violation of the CoAP standard, and there's little aiocoap can do to help. (As long as we're not talking about the RFC8974 token length extension). Generally, the most guidance the application should give a CoAP library is as to how long it expects the request to be active (so that the shorter-lived requests can use the more efficient tokens). This is all strictly guidance, though: As soon as a proxy enters the playing field, the token will be fully up to the client side of the proxy.

As the peer is a commercial application, chances are they fix requests by paying customers; aiocoap is quite available for commercial requests, but I doubt long-term maintenance of such a feature in aiocoap would be any cheaper than fixing it there; feel free to contact me off-github if their quote is vastly unreasonable.

If there is a peer that requires a particular token length, that peer is in severe violation of the CoAP standard, and there's little aiocoap can do to help. (As long as we're not talking about the RFC8974 token length extension). Generally, the most guidance the application should give a CoAP library is as to how long it expects the request to be active (so that the shorter-lived requests can use the more efficient tokens). This is all strictly guidance, though: As soon as a proxy enters the playing field, the token will be fully up to the client side of the proxy.

Thank you for the insight. I do plan to go through the RFCs soon, but was hoping you could give me the gist/off the cuff commentary exactly like that, especially re: tokens

As the peer is a commercial application, chances are they fix requests by paying customers; aiocoap is quite available for commercial requests, but I doubt long-term maintenance of such a feature in aiocoap would be any cheaper than fixing it there; feel free to contact me off-github if their quote is vastly unreasonable.

I should clarify- this is actually a closed-source/proprietary product that happens to implement a COAP server and very limited operation as a client, as a "discovery" mechanism to guide clients to its server, all as part of the larger software suite. In fact, I don't think the COAP interface is intended to be directly interfaced with by third-party clients, so I suppose I can give them a pass for being finicky :>

Regardless, thank you for the offer to make your time available, it's appreciated. I'm hopeful I'll be able to contribute something of value to this project should I continue using it. It seems to be far and away the best option for Python in terms of active development, modern Python support, and maturity in terms of functionality and robustness. Kudos for that

Thanks again

was hoping you could give me the gist/off the cuff commentary exactly like that, especially re: tokens

The client picks a token for the request, it's anything from 0-8 (both inclusive) bytes. The server responds with that very token, and can not make any demands on their size or content. The client makes sure that tokens are not used twice within the same (client addr, client port, server addr, server port) combination within some period of time (where in the multicast case it's up to the client to use tokens not used with any server).


If a server does not comply to the protocol, things will always be iffy. For example, while aiocoap currently sends 4-byte tokens IIRC, that may change at any time, and will not be considered a breaking change here.