VRAM usage is very high
Nielsbishere opened this issue · 1 comments
Nielsbishere commented
Is your request related to a problem? Please describe.
My 970M only has 2 GiB VRAM, so it can't (properly) run 1080p, only 720p in emibl, but it can in viknell. It only runs because I have 8 GiB shared RAM, so it swaps out memory all the time. This means that on 1k it runs at 10-15 fps and 4k is probably way worse.
Describe the solution you'd like
A lot taken from my old branch https://github.com/TeamWisp/Procedural-Ray-Tracing/tree/feature_optimization.
- Frame buffers:
- G-Buffer
- Use RG8 instead of RG16f for roughness (2 Bpp reduce)
- Use normal compression to use RG16f instead of RGBA16f (4 Bpp reduce)
- Get rid of unused (24 Bpp)
- Total: 30 Bpp (30 / 76.75)
- G-Buffer
- Textures:
- Load HDR textures using RGBA16f instead of RGBA32f
- Internally bake it to dds or load it with compression (always)
- Models:
- uv & pos can use ushort instead; 10 Bpv (saving 10 Bpv)
- normal, tangent, bitangent can use unorm quat instead; 4 Bpv (saving 32 Bpv)
- color can use uint instead; 4 Bpv (saving 8 Bpv)
- Total: 18 Bpv instead of 68 Bpv
Describe alternatives you've considered
N.A. it's optimization.
Additional context
Note: The following is taken after ad0a90b, meaning that we use a lot less VRAM, like 40 Bpp less
Constant usage:
- 128 - 256 MiB are taken by models (64 per pool)
- Vertex size is 56, so approx 1 Mil vertices per pool (think about fragmentation that gets worse if vertices aren't compressed, since we have less vertices).
- Use vertex compression; this can reduce it to approx 24 Bpv instead of 56, giving us 2.3x the vertices (might be important for big scenes)
- Compression might have to use a compute shader, since we have A LOT of work to be done; 1 compute shader should get the max values, 1 should use those to compress them. Using 1 thread or 8 (multi threaded) is a lot less logical.
- Vertex size is 56, so approx 1 Mil vertices per pool (think about fragmentation that gets worse if vertices aren't compressed, since we have less vertices).
- Textures take up A LOT in Emibl
- One image already takes up 128 MiB (reason: DirectXTexHDR reads it as RGBA32f, while RGBA16f should be used)
- Use texture compression (internally bake it to dds or process it)
Variable usage:
- Frame buffers
- Individual
- Deferred main: 20 Bpp (4 depth, 6 normal, 6 color, 2 roughness, 2 metallic)
- Convolution & Cubemap take up: 24 Bpp (12 each; unused)
- Deferred composition: 8 Bpp (color)
- Post processing: 4 - 8 Bpp (color; depending on if you render to HDR backbuffer)
- DoF CoC: 4 Bpp (cone of confusion)
- DoF down scale: 4 Bpp (16 Bpp 8 near, 8 far color, but half res)
- DoF dilate near: .25 Bpp (4 Bpp, but quarter res)
- DoF dilate flatten: .25 Bpp (4 Bpp, but quarter res)
- DoF dilate flatten 2: .25 Bpp (4 Bpp, but quarter res)
- Bokeh: 4 Bpp (16 Bpp, but half res)
- Bokeh filter: 4 Bpp (16 Bpp, but half res)
- DoF post filter: 4 - 8 Bpp (color; depending on if you render to HDR backbuffer)
- Per task
- Deferred main: 20 Bpp
- Deferred composition: 8 Bpp
- Post processing: 4-8 Bpp
- DoF: 12.75- 16.75 Bpp
- Bokeh: 8 Bpp
- Unused: 24 Bpp
- Total:
- Non HDR: 76.75 Bpp
- HDR: 84.75 Bpp
- Meaning that (non HDR):
- 1280x720 = 67 MiB
- 1920x1080 = 152 MiB
- 3840x2160 = 607 MiB
- Individual