/natural_sort

Elixir natural sort implementation for lists of strings.

Primary LanguageElixirMIT LicenseMIT

NaturalSort

NOTE v0.2 used an explicit second argument (case_sensitive). This has been removed and replaced with an options keyword list, this is a breaking change.

Sort a list of strings containing numbers in a natural manner.

Sort functions will not [generally] sort strings containing numbers the same way a human would.

Given a list:

["a10",  "a05c",  "a1",  "a",  "a2",  "a1a",  "a0",  "a1b",  "a20"]

Applying standard sort will produce:

["a", "a0", "a05c", "a1", "a10", "a1a", "a1b", "a2", "a20"]

But applying a natural sort will give:

["a", "a0", "a1", "a1a", "a1b", "a2", "a05c", "a10", "a20"]

Functions

Just the one:

NaturalSort.sort(list, options \\ [])

Sorts a list of strings (ascending). This works by leveraging Elixir's Enum.sort_by/3 function (which takes as the second argument a mapping function). The mapping operation converts each string into a list of strings and integers. Once in this form, applying the sort function results in a correctly sorted list.

Options

There are currently two available options (passed as a keyword list), :direction and case_sensitive.

  • :direction may have a value of :asc or :desc, and defaults to :asc.
  • :case_sensitive may be true or false, and defaults to false.

Examples

  iex> NaturalSort.sort(["x2-y7", "x8-y8", "x2-y08", "x2-g8" ])
  ["x2-g8", "x2-y7", "x2-y08", "x8-y8" ]

  iex> NaturalSort.sort(["a5", "a400", "a1"], direction: :desc)
  ["a400", "a5", "a1"]

  iex> NaturalSort.sort(["foo03.z", "foo45.D", "foo06.a", "foo06.A", "foo"], case_sensitive: :true)
  ["foo", "foo03.z", "foo06.A", "foo06.a", "foo45.D"]

  iex> NaturalSort.sort(["foo03.z", "foo45.D", "foo06.a", "foo06.A", "foo"], [case_sensitive: :true, direction: :desc])
  ["foo45.D", "foo06.a", "foo06.A", "foo03.z", "foo"]

Prior art:

VersionEye's naturalsorter gem was the inspiration, with that being based on Martin Pool's natural sorting algorithm, and making direct use of the Ruby implementation of the original C version.

Elixir's Version module does a similar thing.

Todo/Review

  • REVIEW: Benchmark further.
  • ENHANCEMENT: Add options: choice to use unicode, choice to strip whitespace from result.
  • ENHANCEMENT: Stream rather than map - this was designed to aid me in organising vast amounts of files by name; mapping over large lists seems inefficient.