Kendall’s tau and the related Goodman-Kruskal gamma are rank correlation coefficients, which compare pairs (x[i], x[j])
and (y[i], y[j])
for all i
and j
.
These rank correlation coefficients can be calculated in O(n log(n))
. For large vectors, this is considerably faster than the O(n^2)
algorithm of cor(x, y, method="kendall")
in R’s standard implementation. Efficient algorithms are described in Knight (1966) and Christensen (2005). The approach of the latter is used here.
The easiest way to install R packages from github is to use the tools provided by the devtools package. Install devtools and load it.
install.packages("devtools") library(devtools)
Then install fastkendall directly from github.
install_github("fastkendall", "looschen")
Download the source from the “Downloads” tab. Unpack your download.
tar xjvf the-download.tar.gz
Build the package.
R CMD build the-download
A new file fastkendall-x.y.tar.gz should appear. Start R
R
and install the package.
install.packages("path/to/fastkendall_x.y.tar.gz", repos=NULL)
Load the library
library(fastkendall)
and correlate.
fastkendall(runif(100), runif(100))
This also works with matrices.
M = matrix(rnorm(10000), 100) N = matrix(rnorm(1000), 100) ## result is a list of huge matrices result = fastkendall(x, y)
W. R. Knight. A Computer Method for Calculating Kendall’s Tau with Ungrouped Data. Journal of the American Statistical Association, 61(314):436-439, 1966.
D. Christensen. Fast algorithms for the calculation of Kendall’s tau . Computational Statistics, 20:51-62, 2005.