sagemath/sage

plot_histogram improvments

Closed this issue · 19 comments

Since R's histogram plotting is not working for me, I added some functionality to dft.py's plot_histogram method for IndexedSequences.
Hope this is useful to others.

CC: @cswiercz

Component: graphics

Keywords: plot histogram

Author: David Joyner

Reviewer: Chris Swierczewski

Merged: sage-4.1.alpha2

Issue created by migration from https://trac.sagemath.org/ticket/6162

Attachment: trac_6162-histogram.patch.gz

based on 4.0.rc1

comment:1

This passes sage -testall on an amd64 ubuntu 9.04 machine.

Description changed:

--- 
+++ 
@@ -1,2 +1,2 @@
 Since R's histogram plotting is not working for me, I added some functionality to dft.py's plot_histogram method for IndexedSequences.
-Hope this is useful to other.
+Hope this is useful to others.
comment:4

Original patch was not great; this one is more configurable, fixes a bug in the plot command, and fixes some of the zillion typos throughout.

wdj, ispell-comments-and-strings in emacs is your friend. That and flyspell-mode.

Changed keywords from none to plot histogram

Author: Nick Alexander

comment:5

Attachment: trac_6162-plotting.patch.gz

Thanks Nick!

Should I review your patches or does someone other than the 2 of us review both patches?

Reviewer: cswiercz

comment:7

Nick,

I have very bad luck with your patches. :( Is trac_6162-plotting.patch also based on 4.0.rc1? I tried applying it to 4.0.2 after successfully applying trac_6162-histogram.patch and I received the following error:

sage: hg_sage.apply("trac_6162-plotting.patch")
cd "/home/cswiercz/sage/devel/sage" && hg status
cd "/home/cswiercz/sage/devel/sage" && hg status
cd "/home/cswiercz/sage/devel/sage" && hg import   "/home/cswiercz/trac_6162-plotting.patch"
applying /home/cswiercz/trac_6162-plotting.patch
patching file sage/gsl/dft.py
Hunk #2 FAILED at 52
Hunk #3 FAILED at 156
2 out of 4 hunks FAILED -- saving rejects to file sage/gsl/dft.py.rej
abort: patch failed to apply

This was done on a local sage.math install. I wonder if there were changes to sage/gsl/dft.py between the two versions...

I don't have a 4.0.rc1 install anywhere so I'll have to get that up and running in order to review.

comment:8

Replying to @cswiercz:

Nick,

I have very bad luck with your patches. :( Is trac_6162-plotting.patch also based on 4.0.rc1? I tried applying it to 4.0.2 after successfully applying trac_6162-histogram.patch and I received the following error:

Sorry, I wasn't clear. Apply only -histogram.patch.

comment:9

Replying to @ncalexan:

Sorry, I wasn't clear. Apply only -histogram.patch.

Perhaps I don't understand the difference between a histogram of a set of values and a histogram of an _indexed_ set of values but it seems to me that the following should output the familiar "bell-shaped" distribution:

sage: J = [ZZ(n) for n in range(10^3)]
sage: A = [RR(gauss(0,1)) for n in range(10^3)]
sage: s = IndexedSequence(A,J)
sage: P = s.plot_histogram()

However, this looks pretty much the same as

sage: Q = s.plot()

Also, the following is closer to the bell curve but it still doesn't capture what's going on: (swapping the IndexedSequence inputs)

sage: t = IndexedSequence(J,A)
sage: R = t.plot_histogram()

I'm just wondering if I'm missing something.

Finally, finance.timeseries.TimeSeries has a plot_histogram method that seems to work pretty well. Maybe you can use it somehow? Anyway, those are just my thoughts. Again, if I'm missing something I'll be happy to take another look.

comment:10

I don't know why you expect your example to look like a bell curve.
In any case, there are too many points for plot_histogram to work with your example. Try

sage: J = [ZZ(n) for n in range(10)]
sage: A = [RR(gauss(0,1)) for n in range(10)]
sage: s = IndexedSequence(A,J)
sage: P = s.plot_histogram()
sage: show(P)

Regarding your suggestion to use , I get:

sage: import finance.timeseries.TimeSeries
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

....

Can you be more specific?

comment:11

Now I see what you meant.

I think you meant to compare the following plots:

sage: J = [ZZ(n) for n in range(10)]
sage: A = [RR(gauss(0,1)) for n in range(10)]
sage: s = IndexedSequence(A,J)
sage: s.plot_histogram()
sage: t = finance.TimeSeries(A); t
[0.6636, 0.7983, -0.1451, 0.0838, -0.4355, -0.5719, 0.2572, 1.2802, -1.2696, -0.0642]
sage: t.plot_histogram()

If so, you can see that they are completely different. I don't see how to use the sage/finance/time_series.pyx
for this purpose.

comment:12

Using your example, I'm just curious is there's any difference between the following two plots: (aside from the default red indexing below each bar in the plot)

sage: J = [ZZ(n) for n in range(10)]
sage: A = [RR(gauss(0,1)) for n in range(10)]
sage: s = IndexedSequence(A,J)
sage: s.plot_histogram()
sage: bar_chart(s.list())

The only difference I can see is bar placement. (The indexing set easily allows you to place the bars in IndexedSequence().plot_histogram() wherever you like.) I'm sure that there's a lot of application for kind of plot with an indexed set.

I tested the functionality of your code and, from my observations, it works great! Therefore, I give it a positive review. I just wanted a little bit of clarification on what kind of information the plot_histogram() method was actually presenting.

Sorry if my questions caused too much of a problem.

comment:13

Thanks for the link to {{{bar_chart}}. It is similar, except that plot_histogram has an option which allows you to vary the thickness of the bars.

These "improvements" to the plot_histogram function are used to present a histogram of the aggregate totals in a course, especially for a multiple-choice machine-graded exam. Boring but the specific format was required for what I was trying to do.

Merged: sage-4.1.alpha2

Changed reviewer from cswiercz to Chris Swierczewski

Changed author from Nick Alexander to David Joyner

comment:16

Merged trac_6162-histogram.patch in sage-4.1.alpha2. That is David Joyner's patch, and it is now changeset 12600:c169b5109084.