linebender/vello

Flicking problem

Closed this issue · 11 comments

I have test a svg file, which is not that big, not as big as the CIA map case.

When I loaded the file, and zoom in, I find that the screen is flicking, some small parts not rendering correctly.

Is this because of float precision problem?

Quit sure there is no clipping in this file

test

It is found that the config.n_drawobj in coarse.wgsl has a certain relationship. When it exceeds 65535, will there be conflicts between the data before 65535 and the data after 65535 due to the synchronization of the working group, resulting in problems with the graphics display. Is there a better solution to the 65535 drawing limit?hope it can be resolved.

Yes, a limit of 64k draw objects is a known problem, and has a straightforward solution. This issue can serve as the tracking bug for that. Thanks for the analysis!

can you tell which bug or issue link is there so we can know when it's fixed or has any progress?
thanks a lot

is this bug been fixed?

Not yet. The stroke rework is taking a lot longer than expected, though there is progress. This will be a high priority after that, and is also one of the items tracked in #302.

Is this the same issue?

warning: flashing images

Screen.Recording.2023-12-23.at.17.22.19.mov

No, that issue is caused by overflow of internal buffers (related to #366), which is in turn provoked by not culling lines and tiles that land outside the viewport. We do plan to work on all that.

I plan on addressing the 64k draw object problem shortly. There are three approaches that can be taken.

One is to conditionally apply a 3-level dispatch when the (workgroup size)^2 limit is crossed. This is what's done with pathtags, and I find it ugly. Among other things, it requires more permutations of shaders to be compiled, and there's also some complex conditional logic for which shaders to dispatch. I do have a local patch which is almost done, so it is perhaps the path of least resistance.

The second approach is inspired by a technique I saw in FidelityFX sort, and is implemented in my recent sorting exploration. In that approach, each workgroup iterates over num_blocks_per_wg blocks, where each block is the amount of data currently handled by a single workgroup (256 draw objects). In that way, the size of the sequence is not inherently bounded by workgroup sizes.

A drawback to the latter approach is that it may limit the amount of addressable parallelism. Doing a quick calculation, for very large inputs it will dispatch 64k threads, regardless of the size of the input. That is more threads than directly supported by any existing hardware (RTX 4090 has 16k), though it may limit opportunities for latency hiding.

An advantage to the latter approach is that it's two fewer dispatches.

As a future potential optimization, we may want to have more permutations (specialization by pipeline override) to (a) allow larger workgroups when the hardware supports it (the WebGPU spec only requires 256, which informs the choices we've made), and (b) support iteration over multiple elements per thread. The former is probably the best way to improve opportunities to exploit parallelism on powerful GPUs (1M threads should be plenty for at least a while) and has no real downside other than wiring up the plumbing. The latter is more of a tradeoff, as it improves bandwidth for large problems but limits parallelism for small ones. To switch between the two adaptively requires potentially compiling both variants (affecting cold-start time including shader compilation) and of course the complexity of the logic.

The third approach is to go back to single pass scan techniques, as was done in piet-gpu. We now know how to do this in WebGPU (see Zulip thread) but the performance implications are mixed; in particular it would be a performance regression on Apple Silicon.

I'm most inclined to go with the second approach, as I think it's the best set of tradeoffs and admits additional optimization that would address the biggest shortcoming. I'll start on a PR, and if that goes well, probably apply the same technique to path tags.

should the readme be changed after this was closed?

Thanks for the reminder!

We intend to go through the list of issues in the README before publishing version 0.2.0, but a PR to remove the outdated items now would be welcome