[Suggestion] Implement `RefCnt` for `Result<Arc<T>,Arc<E>>`

Question

[Suggestion] Implement `RefCnt` for `Result<Arc<T>,Arc<E>>`

thargy opened this issue 3 years ago · 5 comments

As the title suggests, we have RefCnt for <Option<Arc<T>>, it would be nice to have an equivalent for Result.

Answer 1 · 2021-08-23T09:18:10.000Z

Hello

It is implemented for Option because Option<Arc> is a natural representation of a „nullable pointer“ ‒ a concept people are used to from other languages, besides it is quite useful. It's also easily implementable (as the None really is implemented by a null pointer).

Now, I'm not entirely against the idea of having something similar for Result<Arc, Arc>. But:

Can you elaborate on the use case why this is useful? I wouldn't want to pollute the crate just because something is possible.
I'm not sure how to implement it from top of my head. I'm not saying this can not be done, but I don't see an obvious way how to do it.
I don't really have the time right now to do it.

So, if you have a use case and the time/motivation to make a pull request, I'll be happy to review it.

Answer 2 · 2021-08-23T12:36:41.000Z

Summary

I've only started learning Rust this week (after nearly 40 years of software development and the last 20 mostly in C#, with lots of other languages too), so I've been trying to implement common code patterns in a 'Rusty way'. Ironically, I get that static configs are something of an anti-pattern, but it's proving invaluable in helping me really get to grips with concurrency and ownership (and probably one of the only Rust scenarios where it is accepted as the configs are frequently accessed in an AOP by logging, etc.)

The use case I had was I'd like to use Result<Arc<Config>, Arc<Error>> for storing configs that may not be available due to errors (e.g. 'file not found', or JSON parsing error). That is because the loading of various configurations occurs asynchronously, or on start-up (e.g. before tracing and logging are set up) and I don't want to 'lose' any errors that occur.

My current workaround is to include an Option<Arc<Error>> in the Config itself, which has the disadvantage of having errors associated with all the different configs, some of which can't even error, so isn't really the 'correct' design.

Full use case

For background, in my system, I store a configuration for each of the following (note all fields are Option<...>s to facilitate merging):

Use	Notes	Errors	Mutability
Saved	Used to read configuration options from a file.	Parse errors & IO Errors	Changes on load/save
Environment¹	Used to read configuration options from the environment.	Parse errors	Immutable
Command-line¹	Used to read configuration options from the command-line arguments.	Parse errors	Immutable
Unsaved (Optional)	Changes to configuration since last save.	None²	Changes
Current	Combined configs held in thread-local storage	None³	Changes on above changes

¹ The 'environment' and 'command-line' configs do not need to be separate and can be combined (e.g. 'start-up') but we only merge them once anyway so may as well keep them separate for usability as performance is not a real issue.
² The 'unsaved' config does not have errors as they are set from code, any failures are handled at the point of changing. Note it is also Option<Config> to indicated 'dirty' state.
³ The 'current' configuration does not have errors as it is always valid (by design) and created by merging.

As you can see, the last 2 configs don't require error storage, and the first 3 can all incur errors during initialization, and prior to logging/tracing being set up.

Although this seems complicated, I do it this way to make it easier to implement 'save/load' on configuration, as well as 'revert'. Effectively, to create the current configuration you merge (overwrite any none values, so latter merges have less 'priority', represented by ← below) in the following order:
Changes = Unsaved←Changes←Command-line←Environment←Saved

When changing, you take a set of changes delta and you update Unsaved = delta←Unsaved, and Current = delta←Current
When saving, you merge Changes = Unsaved←Changes and Saved = Unsaved←Saved and you blank Unsaved.

This has the following happy effects:

On start-up the order of precedence is Command-line←Environment←Saved, which allows you to overwrite the saved configuration using environment variables, and overwrite the environment variable with the command-line.
Whilst running, any changes you make take precedence (and do so for the rest of the run), which is what you would expect.
When saving, only changes are saved (not anything specified by the command line/environment)

Implementation

I can access each config, directly, e.g.:

impl Config {
	...
	// Note, these are static so don't need `ArcSwap` anyway...
	fn environment() => Result<&'static Config, &'static Error> {...}

	// Saved config is held in a static `ArcSwap<Config>`
	static SAVED_CONFIG: Lazy<ArcSwap<Config>> = ...;
	fn saved() => Result<Guard<Arc<Config>>, Arc<Error>> {
		// Note we have to build our result on retrieve from the private field
        let config = (*SAVED_CONFIG).load();
        if config.error.is_some() {
            return Err(config.error.unwrap());
        }
        Ok(config)
	}

	// We hold unsaved in an `ArcSwapOption<Config>` as there may be no changes.
	static CONFIG_CHANGES: Lazy<ArcSwap<Config>> = ...;
	fn unsaved() => Guard<Option<Arc<Config>>> {
        UNSAVED_CONFIG.load()
	}
}

As you can see, as the 'environment' and 'command_line' configurations are immutable on creation, I don't need to use ArcSwap, and as the 'unsaved' and 'current' configurations cannot error, I can expose Guard<Arc<Config>> directly (and the private 'error' field is a waste).

That leaves the save configuration, which is both mutable and 'error-prone'. As Config contains error: Option<Arc<Error>>, which is itself (only changes with the configuration), but it is private.

I suppose, I could make Config and enum, however, I don't want the error to be public to the configuration (as it's not supposed to be there anyway).

Proposal

It would be 'better' if that signature were:

impl Config {
	...
	// Saved config uses new `ArcSwapResult<,>` type
	static SAVED_CONFIG: Lazy<ArcSwapResult<Config, Error>> = ...;
	fn saved() => Guard<Result<Arc<Config>, Arc<Error>> {
		// Works just like the `unsaved` example above.
		SAVED_CONFIG.load()
	}
}

So, if you have a use case and the time/motivation to make a pull request, I'll be happy to review it.

I've written up the above use case and 'proposal'. Currently, it's a little bit too soon for me to submit PRs as I'm still learning, but I'll keep this in my bag as a potential first Rust PR 👍🏻

Answer 3 · 2021-08-23T13:54:33.000Z

I must admit, you lost me in all the details. I kind of understand of where you're aiming.

Wouldn't a transmutation, eg ArcSwapAny<Arc<Result<_, _>> work well enough? If you don't want to make the error public (though what do you return in case there is an error?), you could somehow wrap it or use something like arc_swap::access.

What I worry about is: ArcSwapAny<Arc<T>> is just a wrapper around AtomicPtr<T>. Now, what is the T in case of the Result<Arc<T>, Arc<E>>, is it T or E? And how do I recognize just by the pointer what's in there? There seems to be one more bit of information that's needed and there's nowhere to put it in an atomic pointer.

As for configs… have you looked around what exists? There's the config crate that might be of some help.

Answer 4 · 2021-08-23T14:27:14.000Z

I must admit, you lost me in all the details. I kind of understand of where you're aiming.

Sorry, I got stuck into explaining my thought process, and went into WAY too much detail (so I added the heading so you can skip!)

Wouldn't a transmutation, eg ArcSwapAny<Arc<Result<_, _>> work well enough? If you don't want to make the error public (though what do you return in case there is an error?), you could somehow wrap it or use something like arc_swap::access.

If I understand you correctly, then, yes I could use an Arc of the result; and I thought of that. However, I was concerned it was 'yet more nested Arcs' and that it could be 'hidden' in an ArcSwapResult in an efficient way like ArcSwapOption.

What I worry about is: ArcSwapAny<Arc<T>> is just a wrapper around AtomicPtr<T>. Now, what is the T in case of the Result<Arc<T>, Arc<E>>, is it T or E? And how do I recognize just by the pointer what's in there? There seems to be one more bit of information that's needed and there's nowhere to put it in an atomic pointer.

Yes, I worried it might be something like that, an ArcSwapOption effectively only has to handle a 'none' or reference, but an ArcSwapResult would have to handle 2 references. I haven't gone into your implementation details yet though, so wasn't sure. I was hoping there might be a nice way of doing it. 🤷

As for configs… have you looked around what exists? There's the config crate that might be of some help.

I saw that, and am considering using it, but this was a 'learning exercise'. I wanted to implement myself and see what kind of issues I ran into (loads) and then learn from them (I'm much clearer on ownership than I was!); finally, I look for an existing implementation and see how they did it! And, in most cases, use their version going forward.

One of the things I'm loving about Rust is how easy it is to navigate the source files of the language libraries and imported crates. That is something we only got more recently in .NET thanks to NuGets, then Source links and GitHub, but otherwise we use decompiles. I've been learning loads by seeing how other people implement stuff.

Anyway thanks for your patient comments, you've been a real help on my journey! 👨🏻‍🏫

Answer 5 · 2021-08-27T06:53:33.000Z

You're welcome. But I think I'll close this issue. If anyone comes up with an idea how to implement it in a reasonable way, it can be reopened.