dkogan/feedgnuplot

Histogram Support

eschulte opened this issue · 10 comments

This is a feature request.

I'd love to see support for plotting histograms with feedgnuplot. I often want to see a distribution of command-line data. Would it be difficult to add a --hist option which default to breaking the input data into say 100 boxes, and then takes an optional argument to customize the number of boxes.

If I find the time I'll take a look at this, and if my Perl still works I'll submit a patch (but those are both big IFs).

Thanks for providing this fantastic tool!

That sounds like a good idea. I'll take a look.

Done. This is now supported in several ways (see the docs). Also made terminals more flexible and now allowing comma-separated lists instead of passing options in multiple times.

Main histogram test I was using:

seq 1000 | perl -ne 'my $x = 0; $x += $_ for( map {rand(1000)} 1..500); print "$x " . $x*$x/100000 . "\n";' | feedgnuplot --points --histo 0,1 --curvestyleall 'with boxes' --bin 2000

Pass in '--histstyle cum' as an example of other flavors available.

Thanks for putting this new feature together so quickly.

I've noticed what appears to be an error in the results, although it is possible I'm just misunderstanding the usage.

When running the following

$ cat <<EOF|feedgnuplot --histogram 1 --curvestyleall 'with boxes'
1
2
3
EOF

I would expect the output to be three boxes at 1 2 and 3 all of height 1, because each number appears in the input once and I would expect the order of the input not to matter. That is to say I would expect to pass in data and let gnuplot do the binning. This is how regular gnuplot works, e.g., the following gives the desired result.

gnuplot> binwidth=1
gnuplot> bin(x,width)=width*floor(x/width)
gnuplot> plot '-' using (bin($1,binwidth)):(1.0) smooth freq with boxes
input data ('e' ends) > 1
input data ('e' ends) > 2
input data ('e' ends) > 3
input data ('e' ends) > e

However feedgnuplot with the --histogram option appears to expect that I pass in the literal heights of the bins rather than the data vales.

Thanks

No, it's OK. The problem is that you passed in '--histogram 1', but the curves index from 0. '--histogram 0' works as expected. In your command if the data had a second column, THAT would be plotted as a histogram.

Ah, confirmed. I should have thought to try a 0-index. That again for this useful feature.

FYI, the version of feedgnuplot that added the histograms (1.21) broke some other, fundamental things. The regression is fixed in 1.22, so you should upgrade. I really need to add some unit tests...

I need to know there is somebody supporting this tool. I have some questions about histograms.

Generally, you ask the question, and somebody replies, if they know something. Please make a new issue.

I spent hours tying and couldn't produce a simple histogram with command like this
feedgnuplot --points --histo 0,1 --curvestyleall 'with boxes' --bin 2000 --histstyle cum --terminal 'dumb 80,40'
image
how do you do this in plain terminal or for html? I actually need to show it in htlm as an image or a table,etc.

To include into a webpage, output svg (for vector images) or png (for rasterized images). For instance:

seq 1000 | awk '{print rand()}' | feedgnuplot --histo 0 --binwidth 0.1 --hardcopy /tmp/tst.png

tst

For your specific case, send me the data if you want me to look at it. But in a new issue. I will not reply in this issue anymore.