fslaborg/FSharp.Stats

Hamming distance

Closed this issue · 1 comments

bvenn commented

Description

The Hamming distance is used to determine the number of positions at which two sequences of equal size differ [1].

References

Pointers

  • the implementation should be located here: https://github.com/fslaborg/FSharp.Stats/blob/developer/src/FSharp.Stats/DistanceMetrics.fs
  • within FSharp.Stats.DistanceMetrics, there are specialized modules Vector and Array, that contain the same functionality as the overarching module, but are optimized for high performance. You should first implement a version, that works on two sequences of generic type and if appropriate add optimized versions within DistanceMetrics.Vector and DistanceMetrics.Array.
  • if not inferred automatically, you can ensure that the function works on any input (seq, string, seq), by adding the inline keyword
    let inline hamming (s1: 'a) (s2: 'a) : int =
        //iterate over both sequences and compare the entries at each index for equality
        //since the hamming distance is the number of unequal comparisons the result is and integer
  • optional: add unit tests for sequences of int, float, and strings according to this.
    • don't forget to test negative values within the input sequence, as well as 0. and nan entries
  • optional: add proper XML documentation #281
  • Of course you can start developing in notebooks/scripts and afterwards we try to incorporate into the library.
Hints (click to expand if you need additional pointers)
Samo8 commented

On it