/Supercluster

A .NET machine learning library focusing on clustering and metric learning

Primary LanguageC#MIT LicenseMIT

Supercluster

A .NET machine learning focusing on clustering and metric learning. Currently I am just starting out, but I plan to focus on algorithms that are new (such as the ENN algorithm) as well as old classics (such as DBSCAN). The project is under heavy development and I have many things planed.

Philosophy

I am the only developer on the project so I can do it my way.

  • Research first: All algorithms are self-written. I demand a complete and deep understanding before I start coding. This involves reading many research papers. Many cups of coffee.
  • Correctness: I am a mathematician by training. Machine Learning is just mathematics. Mathematics must be correct, hence all algorithms are rigorously unit tested.
  • Documentation: I strive for super-thorough documentation. Admittedly at this stage it is hard to keep up. But after release 1.1. I will re-document the code base.
  • Quality: Thorough commenting, style-cop compliance, and following good software design principles. Mathematicians have been writing bad code for too long. This stops now. Quality is as important as correctness.
  • Appropriate Efficiency: Code is optimized where it needs to be optimized. Let's be honest. This is C#. I can't compete with C++, but more and more people are using high-level languages for data-science. When an optimization can drastically improve performance (e.g., using k-d-trees, multi-threading) I optimize. When optimization complicates the code for a 1% speed increase, I don't.

###Algorithms Here is a list of currently supported algorithms:

Next Release

Algorithms

Data Structures

Code

  • Refactor of the ENN code.
  • Updating kNN, kMeans and ENN to accept a Tree during construction