VRAM usage is very high

Question

VRAM usage is very high

Nielsbishere opened this issue 6 years ago · 1 comments

Is your request related to a problem? Please describe.
My 970M only has 2 GiB VRAM, so it can't (properly) run 1080p, only 720p in emibl, but it can in viknell. It only runs because I have 8 GiB shared RAM, so it swaps out memory all the time. This means that on 1k it runs at 10-15 fps and 4k is probably way worse.

Describe the solution you'd like
A lot taken from my old branch https://github.com/TeamWisp/Procedural-Ray-Tracing/tree/feature_optimization.

Frame buffers:
- G-Buffer
  - Use RG8 instead of RG16f for roughness (2 Bpp reduce)
  - Use normal compression to use RG16f instead of RGBA16f (4 Bpp reduce)
- Get rid of unused (24 Bpp)
- Total: 30 Bpp (30 / 76.75)
Textures:
- Load HDR textures using RGBA16f instead of RGBA32f
- Internally bake it to dds or load it with compression (always)
Models:
- uv & pos can use ushort instead; 10 Bpv (saving 10 Bpv)
- normal, tangent, bitangent can use unorm quat instead; 4 Bpv (saving 32 Bpv)
- color can use uint instead; 4 Bpv (saving 8 Bpv)
- Total: 18 Bpv instead of 68 Bpv

Describe alternatives you've considered
N.A. it's optimization.

Additional context
Note: The following is taken after ad0a90b, meaning that we use a lot less VRAM, like 40 Bpp less
Constant usage:

128 - 256 MiB are taken by models (64 per pool)
- Vertex size is 56, so approx 1 Mil vertices per pool (think about fragmentation that gets worse if vertices aren't compressed, since we have less vertices).
  - Use vertex compression; this can reduce it to approx 24 Bpv instead of 56, giving us 2.3x the vertices (might be important for big scenes)
  - Compression might have to use a compute shader, since we have A LOT of work to be done; 1 compute shader should get the max values, 1 should use those to compress them. Using 1 thread or 8 (multi threaded) is a lot less logical.
Textures take up A LOT in Emibl
- One image already takes up 128 MiB (reason: DirectXTexHDR reads it as RGBA32f, while RGBA16f should be used)
- Use texture compression (internally bake it to dds or process it)
  Variable usage:
Frame buffers
- Individual
  - Deferred main: 20 Bpp (4 depth, 6 normal, 6 color, 2 roughness, 2 metallic)
  - Convolution & Cubemap take up: 24 Bpp (12 each; unused)
  - Deferred composition: 8 Bpp (color)
  - Post processing: 4 - 8 Bpp (color; depending on if you render to HDR backbuffer)
  - DoF CoC: 4 Bpp (cone of confusion)
  - DoF down scale: 4 Bpp (16 Bpp 8 near, 8 far color, but half res)
  - DoF dilate near: .25 Bpp (4 Bpp, but quarter res)
  - DoF dilate flatten: .25 Bpp (4 Bpp, but quarter res)
  - DoF dilate flatten 2: .25 Bpp (4 Bpp, but quarter res)
  - Bokeh: 4 Bpp (16 Bpp, but half res)
  - Bokeh filter: 4 Bpp (16 Bpp, but half res)
  - DoF post filter: 4 - 8 Bpp (color; depending on if you render to HDR backbuffer)
- Per task
  - Deferred main: 20 Bpp
  - Deferred composition: 8 Bpp
  - Post processing: 4-8 Bpp
  - DoF: 12.75- 16.75 Bpp
  - Bokeh: 8 Bpp
  - Unused: 24 Bpp
- Total:
  - Non HDR: 76.75 Bpp
  - HDR: 84.75 Bpp
- Meaning that (non HDR):
  - 1280x720 = 67 MiB
  - 1920x1080 = 152 MiB
  - 3840x2160 = 607 MiB

Answer 1 · 2019-06-04T09:40:50.000Z

https://github.com/microsoft/DirectXTex/blob/master/DirectXTex/DirectXTexCompress.cpp#L70