NetFabric/NetFabric.Hyperlinq

Rethink return types

aalmada opened this issue ยท 11 comments

LINQ has always been given as the example of functional programming in C#. It was introduced when C# was mostly an imperative language. This has radically changed in the latest versions of C# but LINQ has stayed unchanged...

I'm no expert in functional programming but it seems to me that methods like First(), Single(), Last(), Min(), Max() and Average() have some issues. They all throw exceptions when the source is empty. Single() also throws when the source has more than one item. ElementAt() throws an exception when the index parameter is out of bounds.

Throwing exceptions causes side effects...

There are some alternative like FirstOrDefault(), SingleOrDefault(), LastOrDefault(), DefaultIfEmpty() and ElementAtOrDefault() that return the default value of the return type when the source is empty or the index parameter is out of bounds. DefaultIfEmpty() has a second overload that allows the return of a predefined value. This is still an issue for value-types because the default value or even the predefined one may be possible values of the source. SingleOrDefault() still throws when the source has more than one item.

This project is not constrained by breaking changes avoidance. I want it to be a "better" version of LINQ. Keeping a familiar API but taking advantage of newer language features to improve performance and usability.

There are several possible alternatives...

Try pattern

This pattern can be found in many methods. It consist on a method that returns bool and has an output parameter. The parameter outputs a valid value if the method return true.

Here's a possible implementation of First() using this pattern:

public static bool TryFirst<T>(this IEnumerable<T> source, out T value)
{
  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
  {
    value = enumerator.Current;
    return true;
  }
        
  value = default;
  return false;
}

It can be used this way:

if (source.Where(item => item > 2).TryFirst(out var value))
{
  Console.WriteLine(value);
}   

One limitation is that output parameters cannot output value-types by reference. Hyperlinq currently returns read-only references in the cases where the source is an array, Span<T> or equivalent.

Nullable<T>

The method can return a Nullable<T>:

public static T? First<T>(this IEnumerable<T> source) where T : struct
{
  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    return enumerator.Current;
	
  return default;
}

The type T has to be constrained to value-types. We would need an overload for reference-types but, because generic constraints are not part of the method signature, the methods must have different names. The alternative is to add a second generic parameter but it would be very confusing.

This alternative has the same issue as FirstOrDefault() and SingleOrDefault(). null may be a possible value for reference-types, so it would be not possible to differentiate between an empty collections and one that contains one or more null values.

ValueTuple<bool, T>

The method can return a tuple:

public static (bool HasValue, T Value) First<T>(this IEnumerable<T> source)
{
  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    return (true, enumerator.Current);
	
  return (false, default);
}

But, it's not good idea to use tuples in public APIs.

Option<T>

The method can return Option<T>:

public static Option<T> First<T>(this IEnumerable<T> source)
{
  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    return Option.Some(enumerator.Current);
	
  return Option.None;
}

Possible uses:

// returns first; otherwise throws
var first0 = array
  .Where(item => item > 2)
  .First()
  .Match(
    item => item, 
    () => throw new InvalidOperationException("Sequence contains no elements"));

// returns first; otherwise returns -1
var first1 = array
  .Where(item => item > 2)
  .First()
  .Match(
    item => item, 
    () => -1);

// writes first; otherwise writes <empty>
array
  .Where(item => item > 2)
  .First()
  .Match(
    item => Console.WriteLine(item), 
    () => Console.WriteLine("<empty>"));

It's also possible to add a Desconstruct()method to Option<T>:

public void Deconstruct(out bool hasValue, out T value)
{
  hasValue = _hasValue;
  value = _value;
}

It can then be used the following way:

var (hasValue, value) = array.Where(item => item == 4).First();
if (hasValue)
  Console.WriteLine(value);

The C# language does not support fields to be a reference so it would not be possible to return a readonly reference from arrays or Span<T>. This issue may be worked-around by using ByReference<T>.

Result<TOk, TError>

What about Single()? It throws an exception in two different cases. It's the same exception type but with different messages.

It can either return Option<T> but then there is no way to differentiate the errors.

It can return Result<T, string>:

public static Result<T, string> Single<T>(this IEnumerable<T> source)
{
  using var enumerator = source.GetEnumerator();
  if (!enumerator.MoveNext())
    return Result.Error("Sequence contains no elements");
  var value = enumerator.Current;
  if (enumerator.MoveNext())
    return Result.Error("Sequence contains more than one element");
	
  return Result.Ok(value);
}

But then, just like the exception, we can only differentiate by comparing strings.

It's possible to differentiate the errors with an enum type:

public enum SingleError
{
  Empty,
  Multiple
}

public static Result<T, SingleError> Single<T>(this IEnumerable<T> source)
{
  using var enumerator = source.GetEnumerator();
  if (!enumerator.MoveNext())
    return Result.Error(SingleError.Empty);
  var value = enumerator.Current;
  if (enumerator.MoveNext())
    return Result.Error(SingleError.Multiple);
	
  return Result.Ok(value);
}

Match pattern

An alternative is to use the match pattern without having to call one more method:

public static TOut First<T, TOut>(this IEnumerable<T> source, Func<T, TOut> some, Func<TOut> none)
{
  if (some is null) throw new ArgumentNullException(nameof(some));
  if (none is null) throw new ArgumentNullException(nameof(none));

  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    return some(enumerator.Current);
	
  return none();
}

public static void First<T>(this IEnumerable<T> source, Action<T> some, Action none)
{
  if (some is null) throw new ArgumentNullException(nameof(some));
  if (none is null) throw new ArgumentNullException(nameof(none));

  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    some(enumerator.Current);
  else
    none();
}

I started this journey to avoid throwing exceptions but lets see if it's possible to have the current behavior of First() and at the same time the match pattern.

public static T First<T>(this IEnumerable<T> source, Func<T> none = null)
{
  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    return enumerator.Current;
	
  if (none is null)
    throw new InvalidOperationException("Sequence contains no elements");
	
  return none();
}
	
public static TOut First<T, TOut>(this IEnumerable<T> source, Func<T, TOut> some, Func<TOut> none = null)
{
  if (some is null) throw new ArgumentNullException(nameof(some));

  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
    return some(enumerator.Current);
	
  if (none is null)
    throw new InvalidOperationException("Sequence contains no elements");
	
  return none();
}

public static void First<T>(this IEnumerable<T> source, Action<T> some, Action none = null)
{
  if (some is null) throw new ArgumentNullException(nameof(some));

  using var enumerator = source.GetEnumerator();
  if (enumerator.MoveNext())
  {
    some(enumerator.Current);
  }
  else
  {
    if (none is null)
      throw new InvalidOperationException("Sequence contains no elements");
	
    none();
  }
}

Possible uses:

// returns first; otherwise throws
var first0 = array
  .Where(item => item > 2)
  .First();
		
// returns first; otherwise returns -1
var first1 = array
  .Where(item => item > 2)
  .First(() => -1);

// writes first value; otherwise writes <empty>
array
  .Where(item => item > 2)
  .First(
    item => Console.WriteLine(item), 
    () => Console.WriteLine("<empty>"));

I ended up going full circle. This last solution is compatible with LINQ but, at the same time, supports more functional patterns.

But, can it support the return of references in the case of Span<T>?

public static ref readonly T First<T>(this ReadOnlySpan<T> source)
{
  if (source.Length == 0)
    throw new InvalidOperationException("Sequence contains no elements");

  return ref source[0];
}

public static T First<T>(this ReadOnlySpan<T> source, Func<T> none)
{
  if (none is null) throw new ArgumentNullException(nameof(none));

  if (source.Length == 0)
    return none();
	
  return source[0];
}
	
public static TOut First<T, TOut>(this ReadOnlySpan<T> source, Func<T, TOut> some, Func<TOut> none = null)
{
  if (some is null) throw new ArgumentNullException(nameof(some));

  if (source.Length == 0)
  {
    if (none is null)
	  throw new InvalidOperationException("Sequence contains no elements");
	
    return none();
  }
	
  return some(source[0]);
}

public static void First<T>(this ReadOnlySpan<T> source, Action<T> some, Action none = null)
{
  if (some is null) throw new ArgumentNullException(nameof(some));

  if (source.Length == 0)
  {
    if (none is null)
	  throw new InvalidOperationException("Sequence contains no elements");
	
    none();
  }
  else
  {
    some(source[0]);
  }
}

It requires one more overload and only this one returns a reference. Lambdas still do not support passing by reference...

Hi. Two notes:

  1. The Value of Option/Result should never be exposed on the object itself. I see you're showing it could be deconstructed into a tuple, but that creates a point of failure where the type system may not be able to protect the user from themselves (i.e. when they accidentally ignore the bool value or handle it incorrectly).

Option is an extremely powerful concept. You can bind and map options as if they were actual values and only deal with matching at the system boundary level. You don't have to match straight away.

  1. In 99% of the cases, I imagine Single() returning Option is just fine. That 1% of cases where it's important for the user to know how many elements are in the sequence, can be handled by some general overload like:
seq.MatchCount(3, exact: items => ..., different: items => ..., empty: () => ...)

But it's hard for me to come up where this would be so important that I'd use it.

@Tyrrrz Thanks for the feedback.

Now I understand that Deconstruct() wouldn't be a good idea. I also agree on the Single().

I'm now wondering how Option<T> should be used with async functions. Have you ever used it in this scenario?

You'd probably add Func<Task> overloads for Match but I wouldn't add them for Bind and Map which are typically pure functions.

You can use LINQ comprehension syntax to enable seamless monadic bind on Task<Option<T>> by implementing a custom SelectMany that only invokes the next function if the former is not None. There are some examples here but it's only for regular Option<T>, not Task<Option<T>> which is essentially the same but works with async stuff.

Actually, now that I think about it, you don't need to add overloads for Func<Task> yourself, just add Func<T> and let the user return whatever they want.

// TOut Option<T>.Match<TOut>(Func<T, TOut> some, Func<TOut> none)

var a = Option.Some(500);

await a.Match(
    some: x => Task.Delay(x),
    none: () => Task.Delay(100)
);

Here's a simple snippet I wrote some time ago that shows how to enable monadic comprehensions for tasks in C#. By itself it's not very useful but it works really well when combined with Result or Option. Anyway, this shows the general idea.

public static class TaskMonadExtensions
{
    public static async Task<TDest> Select<TSource, TDest>(this Task<TSource> task, Func<TSource, TDest> transform)
    {
        var result = await task.ConfigureAwait(false);
        return transform(result);
    }
 
    public static async Task<TDest> SelectMany<TSource, TJoin, TDest>(this Task<TSource> task, Func<TSource, Task<TJoin>> join,
        Func<TSource, TJoin, TDest> transform)
    {
        var result = await task.ConfigureAwait(false);
        var joined = await join(result).ConfigureAwait(false);
        return transform(result, joined);
    }
 
    public static async Task Test()
    {
        await
            from a in Task.Run(() => "hello ")
            from b in Task.Run(() => "world")
            select a + b; // hello world
    }
}

@Tyrrrz I already have a branch fully based on Option<>.

You are right in that Match() can handle async without changes but if you need to pass any data, like a CancellationToken, it results in a closure allocation. I added a MatchAsync():

[Pure]
public ValueTask<TOut> MatchAsync<TOut>(Func<T, CancellationToken, ValueTask<TOut>> some, Func<CancellationToken, ValueTask<TOut>> none, CancellationToken cancellationToken = default) 
=> hasValue 
  ? some(value, cancellationToken) 
  : none(cancellationToken);

[Pure]
public ValueTask MatchAsync(Func<T, CancellationToken, ValueTask> some, Func<CancellationToken, ValueTask> none, CancellationToken cancellationToken = default) 
=> hasValue 
  ? some(value, cancellationToken) 
  : none(cancellationToken);

The use of lambda expression by Option<> raises me an issue. One thing I would like to implement is to return by-reference when the source is an array, Span<>, ReadOnlySpan<>, Memory<> or ReadOnlyMemory<>. This allows, less copies and the direct manipulation of the result item, when not read-only. Instead of having to call IndexOf() and then use the indexer to get the item.

C# doesn't allow fields with by-reference types but this limitation can be worked around by using ByReference<>, ReadOnlyByReference<>, NullableByReference<> or NullableReadOnlyByReference<>. The issue is that the lambdas still don't allow passing by-reference. This can also be worked-around by defining delegates but this means, lambdas cannot be used, only methods or local-methods.

To be honest, I'm unfortunately not that familiar with memory optimization techniques, so I can't really suggest a solution.

@Tyrrrz All your suggestions have been great, I just need to think about this some more.

The branch looks nice, I quickly glanced over it and also looked into Option<T> implementation. ๐Ÿ‘

In typical functional scenarios, Where/Select/Bind don't need to be async because you usually use them after getting some data.

var someDataThatDecidesTheWorkflow = await GetDataAsync();

var computedLogic = seq.FirstOrDefault().Where(v => v.Foo == someDataThatDecidesTheWorkflow.Bar);

var someOtherSideEffect = await PushDataAsync(computedLogic);

This is a very basic illustration, but the idea is that you organize your execution into this pure-impure-pure sandwich, where you perform side effects to get data first, perform pure computations on them, then perform side effects to do something with the computed data. In such scenarios, you don't want to have async behavior in Option<T> because that's usually pushed to one of the boundaries. It's not always possible though, sometimes you need to insert side effects in the middle (for example when querying data based on intermediate computation).

Anyway, I don't think that including async variants of Where and possibly other methods is necessarily a bad thing, but you should be careful about how those methods are used. It's just that usually async implies impure side effects because async == IO. And you normally want impurities only when unwrapping the option, which happens in Match. Separating logic into pure and impure like that is what provides testability into functional code, similar to how interfaces and mocks provide testability into classic OOP code.

@Tyrrrz The reason I'm considering async lambdas is because Hyperlinq supports LINQ operations both on sync and async enumerables. This means it implements sync and async versions of each operation, for example, Where(), Select(), FirstAsync(), SingleAsync() and many more. Just like in the "official" System.Linq.Async, these operations support async predicates and selectors.

I thought exactly the same as you. These operations on Option<> are performed after the value is retrieved but then, what if the predicate for Where() needs to call an async method? It would force the developer to call .GetAwaiter().GetResult(). My gut feeling is that this is becoming increasingly common as many APIs are becoming async. But, am I right? I'm not sure...

I'm juggling here between making it complete but also manageable. ๐Ÿ˜“

Yeah I get that. But the reason these async counterparts make sense is because the actual enumerator is asynchronous too. In case with Option<T> (which is essentially a so-called unit-container, an enumerable of zero or one element), there is no async version. So does it make sense in this case?

As a general rule of thumb, async almost always implies impure IO (unless it's Task.FromResult) so by adding it into an otherwise pure function, it becomes impure as well.

Maybe you have an example where it makes sense to use WhereAsync on Option<T>?