fortran-lang/stdlib

Implement a `unique` function returning only the unique values in a vector.

Opened this issue · 6 comments

Motivation

Recently, I've run into the problem of extracting unique values in a vector (of any integer, real or complex type or possibly even character). Consider for instance the following vector x = [1, 2, 3, 3, 4]. What I'd need is a function taking x as input and returning the vector y = [1, 2, 3, 4] as output. The interface for a real-valued vector could be as simple as

pure function unique(x, sorted) result(y)
     real(dp), intent(in) :: x(:)
     !! Array whose unique values need to be extracted.
     logical(lk), optional, intent(in) :: sorted
     !! Whether the output vector needs to be sorted or not (default .false. ?)
     real(dp), allocatable :: y(:)
     !! Vector containing only the unique values from x.
end function

The output vector could be sorted or not, depending on the user's choice. I know that there are no Fortran intrinsic functions for that purpose, but I ain't sure something like that is already available in stdlib. If I'm wrong, could anyone point me to the correct function?

Prior Art

  • In Matlab, there is the unique function whose description is available here.
  • Python has the set function taking as input a list and returning only the unique elements of this list.
  • Numpy has np.unique whose description is available here.
  • @jacobwilliams provides an integer-based implementation on his blog (here).

Additional Information

Both Matlab and Numpy's implementations cover a relatively large set of cases (1D-array, multidimensional arrays, different types, etc) and return values (the unique elements, the corresponding indices, indices to the reconstruct the original array from this unique set, etc).

I don't know if absolutely all these cases need to be covered (at least as a starting point). I would probably recommend to start with the simplest ones (i.e. only input vectors and output vector with the unique elements) as these are probably the most common situations where a unique function might be needed. That would include integer, real, complex and character 1D-arrays.

I'm not sure either into which module this utility function should be included. Maybe stdlib_sorting?

Good idea @loiseaujc, please note there is an open discussion at #670, should we merge this issue with that one?

Oh sure! I completely overlooked this issue.

is this issue open to solve? would love to contribute

Sure, it is still open! As @perazz mentioned, it is closely related to #670 and the two could probably be merged. I haven't worked on this one at all so far. I encourage you to get in touch with @Beliavsky. Maybe they've started to craft something on the side. I don't know.

I am going to Implement a rough draft and if I am able to be on the right track Just let me know and we will figure it out from there.