Summary of state of antialiased lines in Datashader

Question

Summary of state of antialiased lines in Datashader

Opened this issue 9 months ago · 0 comments

This is a summary of the current state of antialiased lines in Datashader, and possible future changes.

Datashader supports antialiasing in all Canvas.line calls regardless of the type of data source, for all reductions except std and var, on CPU but not GPU, with and without Dask. There are some good examples of the output in issue #1148. Since then antialiased mean reduction has improved (#1300) so that the output now is better:

It is necessary to understand some of the implementation details to understand the limitations and impact of possible future changes. When using antialiasing, all aggregations are floating point as this is necessary to support the fractional antialiased edge. For example, a non-antialiased count reduction using uint32 whereas an antialiased count reduction uses float32.

Some reductions can be calculated in a single pass, just like non-antialiased reductions, but some need two passes. This is referred to internally as the "2nd stage aggregation". This can have a large performance impact but is necessary to produce accurate results. This is closely related to the concept of self-intersection, and indeed there are some reductions (count and sum) which have a self_intersect boolean kwarg to control whether self-intersection should be considered or ignored.

Consider drawing a 10-pixel wide line in Bokeh with a 50% alpha blue line (RGBA of #00f8) where the line crosses over itself. The pixels displayed are blue wherever the line goes, but the alpha value varies from 0 (no line at all) to 50%. Even where the line crosses over itself, the alpha is not additive, it is the maximum of the alpha of the contributing line segment so it is never more than 50% anywhere. This also applies in the antialiased line edges as where two edges coincide we don't add the alpha we take the max. So a Datashader equivalent of this 2D graphics renderer approach for an any or count reduction (i.e. the really simple ones that don't use some other value) is actually a max reduction. This is the essence of why we need two stage aggregations. For any and max a single stage will always suffice. For a min we need two stages, the first is a max stage and then the second is a min. A count or sum ignoring self-intersections also used two stages, otherwise we double-count those intersection locations. A count or sum with self-intersections can be done in a single stage but we have to be careful. At a join between adjacent line segments the finite width of the line means that we can touch a particular pixel twice, e.g. once in the edge of the end of the first segment and the second time in the middle of the second segment.

In a conventional 2D graphics renderer, pixels where alpha is less than one mix in some of the color below. We fundamentally cannot do this, there is no color below. DataFrame rows are processed in a different order depending on if we are using Dask and if so, how the DataFrame is partitioned. So we can only ever use our alpha to mean fraction of the value we would be using if we weren't antialiasing, we cannot mix this with (1-alpha) of the color/value beneath. This means that the edges of antialiased lines that go over each other have to have jagged edges as we transition from a very small alpha of the best value to a very high alpha of the second best value.

That is a brief summary of the complexity of the internals. Let's now take a step back and identify what we like to do better.

Up until recently the mean reduction gave bad results. It is implemented as a sum divided by a count reduction and so the antialiased edges of both would cancel out giving a result that is not antialiased. This has been fixed though, by adding a extra private reduction that is a count that ignores antialiasing.
Antialiased where reductions can give non-antialiased results. Use of the selector reduction of a where returns a row index which is an integer, so it is not antialiased. If this row index is returned to the user then it is fine. But if the row index is used to lookup a different column to return to the user, it cannot be antialiased as we have already thrown away that information. This has a wider impact than is immediately obvious as first and last reductions using dask are implemented internally as where reductions, thus they can give non-antialiased results if using dask but antialiased if not using dask.
It would be great to get rid of the second-stage aggregation as it is both complicated code, and potentially slow.
Using antialiasing with colorful colormaps can give unusual results. If you use a monochrome or nearly monochrome colormap the results are fine, but consider e.g. a fire colormap used with an any reduction. The middle of lines will be yellow, but the edges will be half of the value meaning they are displayed red. Really we want to display them still as yellow but with half alpha.

Let me say now that I have no answer for these! But I do have some thoughts on possibilities.

To deal with item 2 we could switch to the internal workings of an antialiased reduction using two separate aggregations, one the original non-antialiased one and a second floating-point one that is effectively the corresponding alpha. This fixes item 2 straightaway as the selector for the where would calculate both the row index agg and the alpha agg, and we use the row index agg to lookup the required column to return to the user and multiply it by the alpha. There would be a performance impact here as each append function would be updating two separate agg arrays, but we would still only be looping through the data source once.

It starts to look like we can address some of item 3 with this. Take a min reduction which is currently 2-stage. Now we can store the min value in the first agg and the alpha in the second is the maximum of all alphas that visited this pixel with the min value. I am not yet sure if this works for more complex aggregating (rather than selecting) reductions. Dropping the second-stage aggregation here means we lose the ability to have non-self-intersecting reductions, which might be fine but is a big change. It is also not clear how this affects 3D aggregations like max_n.

So far this new extra alpha agg has just stayed internally within each individual Canvas.line call. For item 4, if this alpha could be kept around and reused for e.g. transfer_functions.shade then we would be able to color pixels by their non-antialiased value and then apply the alpha after that. The edges where we transition from one line to another will be abruptly jagged, and I don't see how this would interact with the current alpha mixing of categorical aggregations. For this to work the returned xr.DataArray or xr.Dataset from the Canvas.line call would have to be different, which is a pretty major API change.