dotnet/orleans

Support for stronger single-activation guarantee

sergeybykov opened this issue · 14 comments

Orleans temporarily violates the single-activation guarantee in certain failure cases preferring availability over strong consistency. In some scenarios applications are okay to trade availability for the simplicity of strong consistency. Today, strong consistency is achievable via external mechanisms, such as storage with etag support. This proposal is to add a mechanism and an extensibility point to formalize the pattern.

  1. Add a grain class attribute, e.g. StrongSingleActivation(TimeSpan leaseTime) that would indicate that before creating an activation of a grain of this type a lease has to be obtained by the runtime. A failure to obtain a lease will fail the activation process. A failure to renew the lease will trigger deactivation of the grain before the lease expires.

  2. Add a lease provider interface and a config option for defining lease providers.

public interface IGrainLeaseProivider
{
    Task<DateTime> GetLease(GrainReference grain);
    Task ReleaseLease(GrainReference grain);
    Task ReleaseLeases(GrainReference[] grains);
    Task<DateTime> RenewLease(GrainReference grain);
    Task<DateTime[]> RenewLeases(GrainReference[] grains);
}

Methods of IGrainLeaseProivider would return expiration UTC time(s) for the lease(s).
Catalog would be responsible for trying to renew leases, e.g. when a half of the original lease time elapses. We could start with a single renewal attempt, and add optional retries later. As part of the deactivation sequence, Catalog will make a best effort to release the lease.

A first lease provider could simply leverage Azure Blob leases. A more performant/scalable solution could leverage other consistency and leader election mechanisms.

Sorry about the actions. was trying to comment. But clicked the wrong button.

@sergeybykov have you considered timespan as a way of expressing lease duration? This may be preferable as it removes any possibility of errors due to clock drift when writing a provider.

I notice that the Azure blob HTTP API uses a duration (x-ms-lease-duration) in seconds.

https://docs.microsoft.com/en-us/rest/api/storageservices/fileservices/lease-blob

Perhaps this could be implemented in a placement director - pros: pluggable, non-invasive, cons: work needs to be done to ensure composability with other placement strategies

If the interface methods accept the SiloAddress/id of the silo which is acquiring/releasing/renewing the lease and return the id of the silo which now holds the lease, that would aid implementation as a placement director.

We could make the system cheap with chunkier leases on slices of a consistent ring.

Exciting!

I'm wary of the lease duration parameter: I see that it has a correlation to the point on the availability/consistency spectrum which the user wishes to achieve, but I worry that it leaks implementation details and conflates policy with mechanism. What if I don't want to implement this policy (strong single activation) using leases as the mechanism?

Is it realistic that users would want different lease durations per grain type? If so, perhaps we could use a type parameter on the interface instead, allowing the implementation to decide.

I also wonder if the interface could be simplified:

public interface IGrainLeaseProivider
{
    Task<TimeSpan[]> AquireLeases(GrainReference[] grains);
    Task ReleaseLeases(GrainReference[] grains);
    Task<TimeSpan[]> RenewLeases(GrainReference[] grains);
}

where the acquisition of a single lease is achieved by passing an array with a single entry.

@ReubenBond I could imagine the lease time might derive from the business domain – where this conflates with the idea of timers – and one would like to ensure it will always be enough to cover the business domain. Technically it means long enough and indeed, per grain type (if going that direction, would it hurt to have both per type and ID). Is renew the same Extended or should we call it Extend, which might be the "retries" option @sergeybykov mentioned and might have a more extendable API.

@richorama

@sergeybykov have you considered timespan as a way of expressing lease duration? This may be preferable as it removes any possibility of errors due to clock drift when writing a provider.

Good point. Wouldn't the provider have to account for the latency of its call to external lease service, and subtract it from the returned duration then? But then if we do that, can't we as easily return absolute expiration time in the local machine terms where Catalog is running?

@ReubenBond

I'm wary of the lease duration parameter: I see that it has a correlation to the point on the availability/consistency spectrum which the user wishes to achieve, but I worry that it leaks implementation details and conflates policy with mechanism. What if I don't want to implement this policy (strong single activation) using leases as the mechanism?

I'm not sure I understand. This is meant to be an interface between Catalog and lease providers, to tell Catalog when it has to let an activation go if it fails to renew its lease of it. This would be optional, only for grain types that are explicitly marked with the attribute.

Is it realistic that users would want different lease durations per grain type? If so, perhaps we could use a type parameter on the interface instead, allowing the implementation to decide.

I suspect it is likely that you'll want different lease durations for different types for different CA tradeoffs. I don't understand what we would get by putting the attribute on the interface. Interface is only a contract, and its the implementation class that gets instantiated and is subject to the CA dilemma I think

@richorama

I also wonder if the interface could be simplified:

I don't see why not. Although I think I missed passing requested lease duration to the Get/Acquire calls. So maybe it should rather be:

public interface IGrainLeaseProivider
{
    Task<DateTime> AcquireLeases(GrainReference[] grains, TimeSpan period);
    Task ReleaseLeases(GrainReference[] grains);
    Task<DateTime[]> RenewLeases(GrainReference[] grains, TimeSpan period);
}

@sergeybykov

I don't understand what we would get by putting the attribute on the interface.

I meant passing the period into the IGrainLeaseProvider interface as a parameter, rather than putting it on the grain interface. You've done that in the latest revision.

I'm not sure I understand. This is meant to be an interface between Catalog and lease providers, to tell Catalog when it has to let an activation go if it fails to renew its lease of it. This would be optional, only for grain types that are explicitly marked with the attribute.

I understand now. My initial impression was that we could use the placement system instead of the catalog - if you squint, it looks doable.
Can we implement this in a way which doesn't require complicating the catalog?

@veikkoeeva

I could imagine the lease time might derive from the business domain

I'm dubious. I believe that there will almost always be a single value. I also don't believe the majority of users will (or should) understand the implications of the lease duration. We might even see some specifying TimeSpan.MaxValue, or a value so short that we cannot keep up with all the activation lease renewals. I won't push back.

@ReubenBond I might have misunderstood the scope. If there is no chance for user code to run, then there isn't a value meaningful to business domain. I had an idea of a pattern regrettably frequent in integrations where an external system is called and it should ever be called only once within some period of time and someone might try that in grain activation.

@ReubenBond

My initial impression was that we could use the placement system instead of the catalog - if you squint, it looks doable.
Can we implement this in a way which doesn't require complicating the catalog?

Placement is mostly stateless logic, only to make a decision when a new activation needs to be created. Catalog is very stateful, keeping track of all local activations and their collection upon inactivity. Since expiration of a lease is a reason for deactivation, and renewal is a prerequisite for keeping an activation in memory, I thought Catalog would be the natural component to have this logic.

To stress something that I realized might not have been obvious in my description, and I heard confused some people, application code will never call IGrainLeaseProivider or otherwise deal with leases explicitly.

All interactions with the lease providers will be done by the Catalog obtaining and renewing leases in order to activate grains and keep them activated, and deactivating grains if unable to renew leases for them on time.

The only reason I thought allowing for an optional TimeSpan leaseTime argument in the StrongSingleActivation attribute would be to allow application to control the potential unavailability window in case of failures that would force waiting for leases to expire. We can, and probably should, support specifying such values via config. In that case, we don't even have to support it as an attribute argument. That's how we manage activation collection settings per grain type today. However, there were requests to also allow specifying activation collection times via a grain class attribute.

This sounds promising. The last comment is from 2016. Is this still on the radar? Have viable alternatives been implemented in the meantime?

I am very interested in having stronger consistency, and, of course, would like to choose a solution that minimizes the cost in terms of availability.

Writing to storage is always strong consistency, so that much is handled. When two instances of the same grain write to storage, only one can win and the other will be terminated. That is the primary reason why this isn't as pressing.

Orleans also has a pluggable grain directory now, eliminating most of the cases where a duplicate could occur (eg, you can use Azure Table Storage as a grain directory, or Redis). The remaining case is when a silo has been declared dead by the others but has not yet learned of that. In that case, grains which are active on that silo are free to also be activated elsewhere. The only way which I can see (feel free to weigh in) to eliminate that which doesn't involve per-operation storage writes/etc, would be to acquire leases at some level of granularity. That would trade some availability for a reduced window for duplicate activations. Of course, major clock skew, VM migration, massive GC pauses, etc, could still result in two hosts thinking they own the same grain simultaneously. So, this is still on the radar, but is not currently top-of-mind.

I appreciate the feedback, @ReubenBond!

[...] eliminating most of the cases where a duplicate could occur. [...] The remaining case is when a silo has been declared dead by the others but has not yet learned of that.

Could you elaborate on what damage such a grain might cause while it is unreachable? I presume the issue is with it processing requests that have already accumulated in its queue? I suppose the practical consequences are limited: mainly writing to storage (which results in an exception) or talking to other grains (which it is unlikely to be able to reach). Would you agree?

Additionally, I'm wondering if it is possible for a silo to get temporarily disconnected, after which a second silo also instantiates a grain present on the silo-in-limbo, and then the latter silo reconnects with the cluster. Is this a way that we might get duplicate grains? I'm interested in the answer for both the in-memory directory as the external directory. My educated guess is that the in-memory one would cause a temporary duplicate grain, whereas the external directory would strictly keep the grain mapped to the silo-in-limbo as long as that isn't declared dead (thus rendering the grain unreachable for a few seconds).

When two instances of the same grain write to storage, only one can win and the other will be terminated.

Understood. My two concerns here are stale reads and transient faults. By now, I've reasoned that stale reads are a non-issue, since any scenario involving them either (A) involves writing the grain's own state, or (B) involves exclusively writing to other grains' states and thus are not atomic with the read regardless. Transient faults, however, can lead to bothersome investigative work, which I think is a valid concern.

[...] Azure Table Storage as a grain directory, or Redis [...]

I'm fond of this solution. It hits me that strong consistency guarantees often go hand-in-hand with solid uptime guarantees. Regrettably, at the time of writing, the Azure SLAs of these two products are only 99.9% (for Azure Storage, I'm referring to writes), with the exception of pricey Redis Enterprise. Alternatively, simple Azure SQL databases tend to have at least 99.99%, but they have much worse response times compared to the in-memory options.

I'm certainly open to writing another grain directory implementation. Do you know of any product that has all three: an SLA of at least 99.95%, an affordable tier, and great response times?

For those already paying for Azure SQL Premium or Business Critical, its 1-2 ms read latency might suffice, and additional infrastructure could be avoided. Beautifully simple. But a bit pricey if not already required.

Edit: The Redis zone redundancy preview announcement (2020) claims that zone redundancy increases the SLA to 99.95%, but the actual SLA page makes no such claim.