diwi/PixelFlow

fluid.getVelocity(float[] dir,int x,int y,int w ,int h), buffer too small

ohnoitsaninja opened this issue · 5 comments

I'm trying to lower the CPU load per frame by only transferring a vertical strip of the velocity map each game frame with arrayCopy, so with 16 sections maybe it takes 16 frames to completely refresh but no part of the full CPU map goes too far behind the GPU map and significantly less load.

fluidMapSection= fluid.getVelocity(fluidMapSection,0, 0, width, height);
Works fine for full frames, but if width or height are changed to anything less than the fluid frame size, I get the error

DwGLTexture.getData_GL2GL3: buffer to small 512000 < 2048000,

where 512000 is the number of pixels in the strip section * 2 and 2048000 is the fluid's width * height * 2. So even if I ask for a smaller section than the total size, it resizes my buffer to the correct size but errors trying to fill it with an entire frames worth of data.

It is also not clear to me why it needs to be passed the buffer as well as be set equal to the buffer instead of just one or the other.

Also is there a way to perhaps resize on the gpu before returning the velocity float list, so it could be smaller, faster, lower resolution, without affecting the fluid simulation with fluid.resize?

diwi commented

thx for pointing this out!
bug is fixed.

I'll do a new release tomorrow after some further testing and will then also adress your second request.

diwi commented

new release PixelFlow v0.55 (25) is available now.
The bug you found is fixed now.

Also, there is an additional option for transfering data by specifiying a buffer offset.
the data-array can now have the size of the original texture size (or bigger, if it is smaller, it will automatically get resized). and the defined sub-region(x,y,w,h) is copied into data after the declared offset position.
Note that the layout is row major.

here is some little code-snippet.

      // float[] data1 = null;
      int region_x = 0;
      int region_y = 0;
      int region_w = fluid.fluid_w;
      int region_h = 200;
      int region_size = region_w * region_h * 2;
      int data1_offset = 0;
      int num_strips = fluid.fluid_h / region_h;

      for(int i = 0; i < num_strips; i++){
        region_y = i * region_h;
        data1_offset = i * region_size;
        data1 = fluid.getVelocity(data1, region_x, region_y, region_w, region_h, data1_offset);
      }

Also is there a way to perhaps resize on the gpu before returning the velocity float list, so it could be smaller, faster, lower resolution, without affecting the fluid simulation with fluid.resize?

yes, you can use the copy-shader for this:
https://github.com/diwi/PixelFlow/blob/master/src/com/thomasdiewald/pixelflow/java/imageprocessing/filter/Copy.java

example:

    // DwGLTexture tex_velocity_downsampled;
    if(tex_velocity_downsampled == null){
      tex_velocity_downsampled = fluid.tex_velocity.src.createEmtpyCopy();
      int w_new = ceil(tex_velocity_downsampled.w/2f);
      int h_new = ceil(tex_velocity_downsampled.h/2f);
      tex_velocity_downsampled.resize(context, w_new, h_new);
    }
    
    DwFilter.get(context).copy.apply(fluid.tex_velocity.src, tex_velocity_downsampled);

the filtering of fluid.tex_velocity.src is set to GL_LINEAR, so the above snipet downsamples (bilinear) the original texture by 2. Of course you can create your own shader for better downsampling.

to take this further, in case the texture data transfer is seriosuly slowing down your program, you could encode the velocity from 2 x 32bit float to lets say 1x32 bit integer (2x16 bit int) into a Integer texture (GL_R32I, GL_RED_INTEGER) and do the data transfer OpenGL -> Host then.

I feel like it's me, but I can't get any of your updates to work, or even "show up". I'm pretty certain I found the correct location to install the new lib, manually confirmed they have your updates, yet processing says the function createEmptyCopy(); doesn't exist when trying to use tex_velocity_downsampled = fluid.tex_velocity.createEmptyCopy(); , even though I can open the file and see that it's there

Also tex_velocity_downsampled.resize(context, w_new, h_new); doesn't work, says it expects the method resize(DwPixelFLow, DwGLTexture ) the type DwGLTexture is not applicable for the arguments (DwPixelFlow, int int), even though I went to the file, confirmed it has public boolean resize(DwPixelFlow context, int w, int h), and when I delete the folder ( C:\Users\Desktop\Documents\Processing\libraries\PixelFlow ) processing says the library is gone, so I'm pretty sure I'm in the right place.

diwi commented

I built 3 small examples, that you can run in Processing.

example 1 transfers a whole frame
example 2 transfers horziontal strips (width x 100)
example 3 downsamples the velocity to a texture of size 20x20 and transfers this very little texture only

in each example, the velocity is visualized as short lines at gridlocations (20x20)

Fluid_GetStarted_TexDataTransfer1.zip
Fluid_GetStarted_TexDataTransfer2.zip
Fluid_GetStarted_TexDataTransfer3.zip

example 3 is by far the most efficient one, since only these velocity values are transfered that are actually used for the visualisation.

I suggest to go through those examples and pick the stuff that fits your needs for your own project.
As you can see, there are numerous ways to achieve more or less the same goal.

btw. you can print the currently installed library version:

    context = new DwPixelFlow(this);
    context.print();

and imo the simples way to update the library via the contribution manager in the PDE. sometimes, when the contributions script isn't up to date you need to uninstall first, and then reinstall to make sure the latest version of any library is installed.

Thanks so much, updating processing was the easy fix.
I played with ex2 an 3, I would think 3 would be faster but 2, the stripe method, is working out better for me. I'm making a game that is locked at 75fps. With entire frames it takes an average of 90000 units of time to transfer, and with 8 strips averages 40000 units of time, so a little over twice as fast, but quite a bit of overhead. Still, it seems to significantly increase the FPS of my program, on the order of 20+fps, well worth it.
Would doing this on another thread help?