- Repo contents
- Best Performance configuration
- Dependencies
- Installation
- Docker
- Examples
- Test data
- Reproduction and Verification
- R:
R
building blocks for user interface code. Internally called by user interface. - data: Data files for testing.
- inst: Citation files
- man: Package documentation
- src:
R
bindings interface and C++ submodule to base repo. - tests:
R
unit tests written using thetestthat
package.
R bindings for k-means NUMA optimized routines. This package is supported for Linux, Mac OSX and Windows.
NOTE: This is a package from C++ source that will compile using your
gcc
compiler.
- Mac OSX: 10.11 (El Capitan), 10.12 (Sierra), 10.13 (High Sierra)
- Linux: Ubuntu 14.04, 16.04, CentOS 6, Fedora 25, Fedora 26
- Windows: 8.1, 10
- Any machine with >= 2 GB RAM
Our software is licensed under the Apache version 2.0 license.
For the best performance on Linux make sure the numa
system package is installed via
apt-get install -y build-essential libnuma-dbg libnuma-dev libnuma1
- We require a recent version of
Rcpp
(install.packages("Rcpp")
) - We recommend the
testthat
package if you want to run unit-tests (install.packages("testthat")
)
Install from CRAN directly. Installation time is normally ~2min.
install.packages("knor")
Install directly from Github. This has dependency on the following system packages:
- git
- autoconf
git clone --recursive https://github.com/flashxio/knorR.git
cd knorR
./install.sh
NOTE: The command may require administrator privileges (i.e., sudo
)
A Docker images with all dependencies installed can be obtained by:
docker pull flashxio/knorr-base
NOTE: The knor R
package must still be installed on this image via:
install.packages("knor")
If you prefer to build the image yourself, you can use this Dockerfile
iris.mat <- as.matrix(iris[,1:4])
k <- length(unique(iris[, dim(iris)[2]])) # Number of unique classes
kms <- Kmeans(iris.mat, k)
To work with data from disk simply use binary row-major data. Please see this link for a detailed description.
fn <- "/path/to/file.bin" # Use real file
k <- 2 # The number of clusters
nrow <- 50 # The number of rows
ncol <- 5 # The number of columns
kms <-Kmeans(fn, nrow, ncol, k, init="kmeanspp", nthread=2)
We provide test data that is included as part of the package and can be accessed directly via this link or through the R
interpreter after the package is require
d in R
as knor::test_data
.
require(knor)
kms <- Kmeans(knor::test_data, knor::test_centroids)
Expected output:
Runtime for this actions should be nearly instantaneous on any machine:
> kms
$nrow
[1] 50
$ncol
[1] 5
$iters
[1] 5
$k
[1] 8
$centers
[,1] [,2] [,3] [,4] [,5]
[1,] 2.881889 4.079735 4.243061 1.953790 2.690649
[2,] 2.494522 2.334093 2.204031 4.161763 2.444349
[3,] 3.630086 2.398294 3.793616 2.404824 4.490043
[4,] 3.909759 3.991190 2.947161 3.762090 1.950588
[5,] 4.574327 3.645658 3.975175 4.505870 3.595890
[6,] 3.190091 4.267428 1.643788 3.229366 3.700539
[7,] 2.110254 3.147714 2.153235 1.581510 3.102312
[8,] 2.186852 2.027695 3.938736 1.410910 2.383727
$cluster
[1] 3 2 3 3 6 8 8 3 3 2 3 4 7 7 5 4 2 1 2 1 2 7 7 5 1 1 8 7 5 2 6 2 4 6 6 8 2 5
[39] 7 4 6 5 6 4 7 4 5 4 2 5
$size
[1] 4 9 6 7 7 6 7 4
Check the R docs
?knor::Kmeans