LinkedInAttic/datafu

Variance Issue?

Closed this issue · 1 comments

Hi Matt,

I was using exactly the same example as in README to compute the variance, except I cannot use input as my variable name (otherwise I get the mismatched input 'input' expecting EOF error), but I've got 6.66666... as my result. Would you help explain where I got it wrong? Appreciate it!

The version I'm using is:
Apache Pig version 0.10.1.4.1304150518 (rexported)

And here's the code I used:

register datafu-0.0.8.jar;

define VAR datafu.pig.stats.VAR();

-- input: 1,2,3,4,5,6,7,8,9
a = LOAD 'input' AS (val:int);

grouped = GROUP a ALL;
-- produces variance of 7.5
variance = FOREACH grouped GENERATE VAR(a.val);

dump variance;
-- (6.666666666666668)

Read me file was wrong. 6.67 is correct variance.
(7.5 is estimated variance based on sample)