/MonetDB-rmath

MonetDB extension to use the rmath library

Primary LanguageCMozilla Public License 2.0MPL-2.0

MonetDB-rmath: MonetDB extension to use the Rmath library.

Introduction

An early version for MonetDB-rmath is now available. It includes approximately 90% of the functions in Rmath.h, including [rpdq] (norm|unif|gamma|beta|lnorm|chisq|nchisq|f|t|binom|cauchy|exp|geom|hyper|nbinom|pois|weibull|logis|wilcox|signrank) and another 65 functions. The current implementation uses m4 for code generation and requires the r-mathlib package under Debian.

Quick installation

One can clone the GitHub repository and install using:

git clone https://github.com/mclements/MonetDB-rmath.git
cd MonetDB-rmath
make
sudo make install

Some examples

You will then need to use a fresh database. An example script is:

select qnorm(0.025,0,1), pnorm(-1.96,0,1);
+--------------------------+--------------------------+
| L2                       | L4                       |
+==========================+==========================+
|      -1.9599639845400538 |     0.024997895148220428 |
+--------------------------+--------------------------+
1 tuple
select sum(R_pow(value,2)*0.01*dnorm(value,0,1)) from
  sys.generate_series(-5.0,5.0,0.01);
+--------------------------+
| L7                       |
+==========================+
|       0.9999845566522483 |
+--------------------------+
1 tuple
select rpois(value*0+100) from sys.generate_series(1,10); -- ok
+--------------------------+
| L5                       |
+==========================+
|                       87 |
|                       90 |
|                       96 |
|                      117 |
|                       87 |
|                       99 |
|                       95 |
|                       86 |
|                      116 |
+--------------------------+
9 tuples
select rpois(100) from sys.generate_series(1,3);  -- repeats the values:-<
+--------------------------+
| L5                       |
+==========================+
|                      128 |
|                      128 |
+--------------------------+
2 tuples

For the last example, a better approach would be to use embedded R. Continuing the example:

create or replace function set_random_seed(seed integer)
  returns boolean language r { set.seed(seed); TRUE };
operation successful
create or replace function rpois2(n integer, mu double)
  returns table(y integer) language r {data.frame(y=rpois(n,mu))};
operation successful
select set_random_seed(10);
+-------+
| L2    |
+=======+
| true  |
+-------+
1 tuple
select * from rpois2(5,10);
+------+
| y    |
+======+
|   10 |
|    9 |
|    5 |
|    8 |
|    9 |
+------+
5 tuples

The functions allow for arguments that are scalars and BATs. Combinations of BATs and scalars are also supported when the BATs are the earlier arguments. The next step is to add further statistical tests. As examples, I have included: rmath_poisson_ci(y,boundary) for an exact Poisson confidence interval for count y for a specific boundary (1=left, 2=right) with default confidence level of 95%; and rmath_poisson_test(y,t) for count y and time exposed t, with the default null hypothesis that y/t=1:

select 10.0/9 as rate_ratio, rmath_poisson_ci(10,1)/9 as lci, rmath_poisson_ci(10,2)/9 as uci, rmath_poisson_test(10,9) as pvalue;
+------------+--------------------------+--------------------------+--------------------------+
| rate_ratio | lci                      | uci                      | pvalue                   |
+============+==========================+==========================+==========================+
|      1.111 |       0.5328209662369371 |       2.0433728935575304 |       0.7364887199809547 |
+------------+--------------------------+--------------------------+--------------------------+
1 tuple

The equivalent R would be:

poisson.test(10,9)
	Exact Poisson test

data:  10 time base: 9
number of events = 10, time base = 9, p-value = 0.7365
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.532821 2.043373
sample estimates:
event rate 
  1.111111