ricklupton/floweaver

Greying of small flows (by threshold)?

owoodhansen opened this issue · 10 comments

Dear Rick

Good work and beautiful package. Thank you.

I have made nice and messy sankey, and I'd like to clean it up a but. I've done it so far by filtering out small flows altogether, but I'd prefer if I could grey them out. Ideally: Imagine five flows coming out of source X, I'd like to specify some of them to be grey and the rest colour X.

I've seen the example where all flows are coloured with continous grading, but I'm looking to colour based on a threshold value.

Tiny side note: Originally I had a ton of trouble getting floweave up and running, because I didn't figure that I had to rename the columns in my dataset to "target", "source" "value", rather I was trying to set it of the methods e.g. source="column1" I might be the problem ofc, but I think it would be more user-friendly if this was mentioned in the text.

Hi @owlonewolf, glad you're finding it useful!

Does this example help?

https://floweaver.readthedocs.io/en/latest/tutorials/colour-scales.html#More-customisation

You can define a custom colour scale class with a custom get_color method, which returns a grey colour if the link value is below some threshold, otherwise returns super().get_color(link, value) to return the default colour.

Thanks, mhmm so you suggest changing this line in def get_palette:
name = 'Greens_9' if link.type == 'Student' else 'Blues_9'
to what exactly? if link.value <= 5

Sorry If I'm missing something basic.

Thank you for the effort Rick. I'm grateful, but I must admit I still can't get where I want.

Your code works and I can colour flows under a threshold. The issue is that I can't seem to keep my manual palette, which gives one colour pr. source.

Here is the colouring I have, which I'd like to combine with a threshold grey.
CDR

Right now, when I apply the threshold function, what I get is (obviously also another dataset and flows)
wrong

I've also tried replacing QuantitativeScale with CategoricalScale to no avail.

PS. In the colour-intensity tutorial, what is the idea behind the data structure in the beginning? I might misunderstand something, but the first two columns confuses me because they repeat the same row and therefore seem unnessary.

I've added another example with a CategoricalScale. Does that do what you are trying to do?

The source and target columns in the data structure are because floweaver is expecting the data to be describing a set of flows between points. In this example, all the flows are in parallel to each other, so it seems redundant. If you work with this kind of data a lot, where the source and target are not relevant, I guess there could be a convenient wrapper function which accepted data without these columns and created a simple SankeyDefinition so you could just specify the partitions and other options.

Code additions welcome!

It does, woohoo! Except it ignores the specified palette, both my custom palette and in your example ('blues9').

Ahh I think I understand now re the columns.

I ventured into the darkness (source code) and it seems the CategoricalScale and prep_qualitative_palette are where it goes wrong.

It is defaulting to "pastel9" as coded in line 62 of color_scales.py.

I'm beyond my expertise here, but for some reason prep_qualitative_palette doesn't receive the palette input, so whether the palette is specified as a dict or a string doesn't matter, cause it just defaults to none.

I think I made a mistake in my examples -- the lines that say

super().__init__(attr)

should say

super().__init__(attr, **kwargs)

so that extra options like palette are passed through.

That did it! Thank you Rick. I'll msg you if the work get published!

As promised, here is a link to the published work see fig 4. Floweaver was used for the main graph (and cited in the supplementary material along with the Extended methods seciton). Thank you for the great package and the support here.