Marsilea-viz/marsilea

hsplit, vsplit inconsistent with numpy?

Closed this issue · 3 comments

Description

Hey! While going over scverse/scanpy-tutorials#97 I noticed a couple things and thought I would follow them up here.

Marsilea's definition of hsplit and vsplit seem inconsistent with what I'd expect coming from numpy. They seem to actually act along opposite axes

Example

import numpy as np
import marsilea as mars

X = np.arange(12).reshape(4, 3)
display(X)
mars.Heatmap(X).render()
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])
display(np.hsplit(X, [2]))
m = mars.Heatmap(X)
m.hsplit([2], spacing=0.1)
m.render()
[array([[ 0,  1],
        [ 3,  4],
        [ 6,  7],
        [ 9, 10]]),
 array([[ 2],
        [ 5],
        [ 8],
        [11]])]

Suggestion

Personally, I think the numpy versions make more sense. However, I also mess this up frequently. Since Marsilea only needs to deal with two dimensional grids I would suggest moving to:

  • group_rows/ group_columns, since this is quite like a group by operation without the aggregate
  • split_rows and split_columns (a bit like ComplexHeatmap)

Yes, it's different. I want to make it more visually intuitive for the API name at first, but it looks like hsplit/vsplit may confuse others.

Thanks for the suggestions, I think it's a good idea to divide the current split into group_* and split_*. But Marsilea can split non-matrix plots like barplot or violin plot, so the endings in rows and columns may be confusing in these cases.

What would the difference between group_* and split_* be to you? My preference at the moment would be if there was only a group, especially since you can pass essentially the same argument here as you would pass to DataFrame.groupby.

To clarify, previously I was suggesting either split or group. I'm not sure I like "both" since they basically do the same thing, and it's nicer if there's only one way to do it.

But Marsilea can split non-matrix plots like barplot or violin plot, so the endings in rows and columns may be confusing in these cases.

I think rows and columns still makes sense for those plots. I believe you still end up with rows and columns, it's just that each entry can be a violin. Addmitedly, I don't think grouping by columns makes sense in a plot like this bar chart: https://marsilea.readthedocs.io/en/stable/auto_examples/plot_oil_well.html#sphx-glr-auto-examples-plot-oil-well-py

The signature for hsplit/vsplit is hsplit(cut=None, labels=None, order=None, spacing=0.01). Users can either specify cut to cut the plot using the index of data or specify labels to group the plot. split_* can be used to handle the cut parameter and group_* can be used for labels parameter.

rows and columns are indeed clearer than horizontal or vertical.