BigKnn is part of HADES.
An R package implementing a large scale k-nearest neighbor (KNN) classifier using the Lucene search engine.
- Build KNN classifiers of arbitrary scale (up to millions of rows, millions of features)
- Fast classification performance due to use of highly optimized search engine (Lucene)
- Supports both weighted and unweighted KNN
covariates <- data.frame(rowIds = c(1,1,1,2,2,3),
covariateIds = c(10,11,12,10,11,12),
covariateValues = c(1,1,1,1,1,1))
outcomes <- data.frame(rowIds = c(1,2,3),
y = c(1,0,0))
dataForPrediction <- Andromeda::andromeda(covariates = covariates,
outcomes = outcomes)
indexFolder <- "s:/temp/lucene"
buildKnn(outcomes = dataForPrediction$outcomes,
covariates = dataForPrediction$covariates,
indexFolder = indexFolder)
prediction <- predictKnn(outcomes = dataForPrediction$outcomes,
covariates = dataForPrediction$covariates,
indexFolder = indexFolder,
k = 10,
weighted = TRUE)
BigKnn is an R package using the Java based Lucene search engine. The data for the KNN is stored in a folder on the local file system.
Running the package requires R with the package rJava installed. Also requires Java 1.8 or higher.
-
See the instructions here for configuring your R environment, including Java.
-
Use the following commands in R to install the BigKnn package:
install.packages("remotes")
remotes::install_github("ohdsi/BigKnn")
Documentation can be found on the package website.
PDF versions of the documentation are also available:
- Package manual: BigKnn manual
- Developer questions/comments/feedback: OHDSI Forum
- We use the GitHub issue tracker for all bugs/issues/enhancements
Read here how you can contribute to this package.
BigKnn is licensed under Apache License 2.0. Lucene fall under its own Apache License 2.0.
BigKnn is being developed in R Studio and Eclipse
Stable.