Summary of state of antialiased lines in Datashader
Opened this issue · 0 comments
This is a summary of the current state of antialiased lines in Datashader, and possible future changes.
Datashader supports antialiasing in all Canvas.line
calls regardless of the type of data source, for all reductions except std
and var
, on CPU but not GPU, with and without Dask. There are some good examples of the output in issue #1148. Since then antialiased mean
reduction has improved (#1300) so that the output now is better:
It is necessary to understand some of the implementation details to understand the limitations and impact of possible future changes. When using antialiasing, all aggregations are floating point as this is necessary to support the fractional antialiased edge. For example, a non-antialiased count
reduction using uint32
whereas an antialiased count
reduction uses float32
.
Some reductions can be calculated in a single pass, just like non-antialiased reductions, but some need two passes. This is referred to internally as the "2nd stage aggregation". This can have a large performance impact but is necessary to produce accurate results. This is closely related to the concept of self-intersection, and indeed there are some reductions (count
and sum
) which have a self_intersect
boolean kwarg to control whether self-intersection should be considered or ignored.
Consider drawing a 10-pixel wide line in Bokeh with a 50% alpha blue line (RGBA of #00f8
) where the line crosses over itself. The pixels displayed are blue wherever the line goes, but the alpha value varies from 0 (no line at all) to 50%. Even where the line crosses over itself, the alpha is not additive, it is the maximum of the alpha of the contributing line segment so it is never more than 50% anywhere. This also applies in the antialiased line edges as where two edges coincide we don't add the alpha we take the max. So a Datashader equivalent of this 2D graphics renderer approach for an any
or count
reduction (i.e. the really simple ones that don't use some other value
) is actually a max
reduction. This is the essence of why we need two stage aggregations. For any
and max
a single stage will always suffice. For a min
we need two stages, the first is a max
stage and then the second is a min
. A count
or sum
ignoring self-intersections also used two stages, otherwise we double-count those intersection locations. A count
or sum
with self-intersections can be done in a single stage but we have to be careful. At a join between adjacent line segments the finite width of the line means that we can touch a particular pixel twice, e.g. once in the edge of the end of the first segment and the second time in the middle of the second segment.
In a conventional 2D graphics renderer, pixels where alpha
is less than one mix in some of the color below. We fundamentally cannot do this, there is no color below. DataFrame rows are processed in a different order depending on if we are using Dask and if so, how the DataFrame is partitioned. So we can only ever use our alpha
to mean fraction of the value we would be using if we weren't antialiasing, we cannot mix this with (1-alpha)
of the color/value beneath. This means that the edges of antialiased lines that go over each other have to have jagged edges as we transition from a very small alpha
of the best value to a very high alpha
of the second best value.
That is a brief summary of the complexity of the internals. Let's now take a step back and identify what we like to do better.
-
Up until recently the
mean
reduction gave bad results. It is implemented as asum
divided by acount
reduction and so the antialiased edges of both would cancel out giving a result that is not antialiased. This has been fixed though, by adding a extra private reduction that is a count that ignores antialiasing. -
Antialiased
where
reductions can give non-antialiased results. Use of theselector
reduction of awhere
returns a row index which is an integer, so it is not antialiased. If this row index is returned to the user then it is fine. But if the row index is used to lookup a different column to return to the user, it cannot be antialiased as we have already thrown away that information. This has a wider impact than is immediately obvious asfirst
andlast
reductions using dask are implemented internally aswhere
reductions, thus they can give non-antialiased results if using dask but antialiased if not using dask. -
It would be great to get rid of the second-stage aggregation as it is both complicated code, and potentially slow.
-
Using antialiasing with colorful colormaps can give unusual results. If you use a monochrome or nearly monochrome colormap the results are fine, but consider e.g. a
fire
colormap used with anany
reduction. The middle of lines will be yellow, but the edges will be half of the value meaning they are displayed red. Really we want to display them still as yellow but with half alpha.
Let me say now that I have no answer for these! But I do have some thoughts on possibilities.
To deal with item 2 we could switch to the internal workings of an antialiased reduction using two separate aggregations, one the original non-antialiased one and a second floating-point one that is effectively the corresponding alpha
. This fixes item 2 straightaway as the selector
for the where
would calculate both the row index agg and the alpha agg, and we use the row index agg to lookup the required column to return to the user and multiply it by the alpha
. There would be a performance impact here as each append
function would be updating two separate agg arrays, but we would still only be looping through the data source once.
It starts to look like we can address some of item 3 with this. Take a min
reduction which is currently 2-stage. Now we can store the min value in the first agg and the alpha in the second is the maximum of all alphas that visited this pixel with the min value. I am not yet sure if this works for more complex aggregating (rather than selecting) reductions. Dropping the second-stage aggregation here means we lose the ability to have non-self-intersecting reductions, which might be fine but is a big change. It is also not clear how this affects 3D aggregations like max_n
.
So far this new extra alpha
agg has just stayed internally within each individual Canvas.line
call. For item 4, if this alpha
could be kept around and reused for e.g. transfer_functions.shade
then we would be able to color pixels by their non-antialiased value and then apply the alpha after that. The edges where we transition from one line to another will be abruptly jagged, and I don't see how this would interact with the current alpha
mixing of categorical aggregations. For this to work the returned xr.DataArray
or xr.Dataset
from the Canvas.line
call would have to be different, which is a pretty major API change.