feat(bench): memory usage depending on `k_table_size`
cyphersnake opened this issue · 21 comments
You need to do the same thing we did in #249, but with memory
Ran it on the server along with dhat for all the same k
as in #249
For PRIMARY_K_TABLE_SIZE=17
.
If this information is not enough for you, please give me any step, I will make you an itemization as per your request
| k | repeat_count | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
| --- | -------------- | ------------- | -------------- | -------- | --------------- | ------- | --------------- |
| 4.58 | 1 | 21,103,360,561 | 8,379,587 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 14.55 | 1000 | 21,149,395,860 | 8,607,137 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 15.55 | 2000 | 21,193,222,448 | 8,806,406 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 16.14 | 3000 | 21,242,455,990 | 9,068,519 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 16.55 | 4000 | 21,290,778,894 | 9,315,603 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 16.87 | 5000 | 21,335,068,150 | 9,518,476 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
- k: The log2 from filled by poseidon step-circuit rows
- repeat_count: The poseidon hash the repetition count.
- total_bytes: The total number of bytes allocated over the entire execution.
- total_blocks: The total number of heap blocks allocated over the entire execution.
- t_gmax: The number of bytes alive at the time when the heap size reached its global maximum.
- t_gmax_blocks: The number of heap blocks alive at the time when the heap size reached its global maximum.
- t_end: The number of bytes alive at the end of execution (i.e., bytes not explicitly freed).
- t_end_blocks: The number of heap blocks alive at the end of execution (i.e., blocks not explicitly freed).
mem_profiling.tar.gz
Raw data, including dhat logs, which we can also analyze with https://nnethercote.github.io/dh_view/dh_view.html
If you want, I can choose some other way to double-check the correctness of the result. Still, the utility I used is not production-ready
I observed 21GB usage for all settings. Does it mean same memory usage for different circuit, step_size etc? I was expecting larger circuit, more memory needed.
Also ran benchmark where primamry & secondary k
circuit size grows from 17 to 21
I observed 21GB usage for all settings. Does it mean same memory usage for different circuit, step_size etc? I was expecting larger circuit, more memory needed.
I ran with different size of filled step circuit rows as I had earlier with benchmark.
Now I also added a test with different size of step circuit table. As soon as the result is ready, I will provide it here
In this table {PRIMARY,SECONDARY}_CIRCUIT_K_TABLE_SIZE
is equal to k
, otherwise the algorithms are identical - fold 1 step
I used COMMITMENT_KEY_SIZE
=27 so that I wouldn't have to change it between cases (it would have affected the size)
For 20, 21 - in process (with a custom allocator, it's going to be a long time)
| k | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
| --- | ------------- | -------------- | -------- | --------------- | ------- | --------------- |
| 17 | 46,470,334,145 | 8,384,081 | 26,415,304,923 | 15,860 | 958,633 | 1,566 |
| 18 | 66,329,954,655 | 13,391,175 | 27,059,130,587 | 15,860 | 958,633 | 1,566 |
| 19 | 103,941,909,225 | 22,422,684 | 28,346,781,915 | 15,860 | 958,633 | 1,566 |
o change it between cases (it would have affected the size)
For 20, 21 - in process (with a custom allocator, it's going to be a long time)
I am a little confused about these two
(1)
k | repeat_count | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
---|---|---|---|---|---|---|---|
4.58 | 1 | 21,103,360,561 | 8,379,587 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
14.55 | 1000 | 21,149,395,860 | 8,607,137 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
15.55 | 2000 | 21,193,222,448 | 8,806,406 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
16.14 | 3000 | 21,242,455,990 | 9,068,519 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
16.55 | 4000 | 21,290,778,894 | 9,315,603 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
16.87 | 5000 | 21,335,068,150 | 9,518,476 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
And (2)
k | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
---|---|---|---|---|---|---|
17 | 46,470,334,145 | 8,384,081 | 26,415,304,923 | 15,860 | 958,633 | 1,566 |
18 | 66,329,954,655 | 13,391,175 | 27,059,130,587 | 15,860 | 958,633 | 1,566 |
19 | 103,941,909,225 | 22,422,684 | 28,346,781,915 | 15,860 | 958,633 | 1,566 |
when k=17, the total bytes of (2) are much larger than table (1). But the table (1) also corresponds to Table size 17. Is that correct?
I did some roughly memory estimate, when k=17, the memory usage is around 2GB. From your data above, it seems not subtract memory usage from other program? @cyphersnake
when k=17, the total bytes of (2) are much larger than table (1). But the table (1) also corresponds to Table size 17. Is that correct?
You didn't notice that I said that to measure for different circuit sizes, I immediately chose COMMITMENT_KEY=27, which means about 16 GB for commitment key only
Also these are different `k's. In the first case it is filled data, in the second case it is absolute data. I'll correct the names now, so it's not confusing
I did some roughly memory estimate, when k=17, the memory usage is around 2GB. From your data above, it seems not subtract memory usage from other program? @cyphersnake
k-table-size = 17
commitment-key = 21
Total: 19,449,044,966 bytes in 6,417,946 blocks
At t-gmax: 1,293,397,125 bytes in 28,956 blocks
At t-end: 161,193 bytes in 222 blocks
If we use the size of the commitment key together with the size of the circuit, the degree of influence of the latter on the result will be too high to estimate the difference in performance, so I made the measurements with COMMITMENT_KEY the maximum of the necessary ones and wrote about it
I will now summarize the results and describe them separately, describing beforehand the design of each experiment and the motivation for that particular design. If you want measurements with any specific parameters, just describe them. The key dimensions are.
- primary_circuit_k_table_size
- secondary_circuit_k_table_size
- commitment_key_sizе - by the way, it can be different too, but we use the same one
- filled by primary circuit rows count
- filled by secondary circuit rows count
(the last two don't affect the result much)
Common terms
- repeat_count: The poseidon hash the repetition count.
- total_bytes: The total number of bytes allocated over the entire execution.
- total_blocks: The total number of heap blocks allocated over the entire execution.
- t_gmax: The number of bytes alive at the time when the heap size reached its global maximum.
- t_gmax_blocks: The number of heap blocks alive at the time when the heap size reached its global maximum.
- t_end: The number of bytes alive at the end of execution (i.e., bytes not explicitly freed).
- t_end_blocks: The number of heap blocks alive at the end of execution (i.e., blocks not explicitly freed).
Test 1
-
primary circuit - poseidon
-
secondary circuit - trivial
-
primary_circuit_k_table_size = 17
-
secondary_circuit_k_table_size = 17
-
commitment_key_sizе - 21 (64 MB * 2)
-
filled by primary circuit rows count - filled_k; varies for the experiment
-
filled by secondary circuit rows count - 0
| filled_k | repeat_count | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
| -------- | -------------- | ------------- | -------------- | -------- | --------------- | ------- | --------------- |
| 4.58 | 1 | 21,103,360,561 | 8,379,587 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 14.55 | 1000 | 21,149,395,860 | 8,607,137 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 15.55 | 2000 | 21,193,222,448 | 8,806,406 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 16.14 | 3000 | 21,242,455,990 | 9,068,519 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 16.55 | 4000 | 21,290,778,894 | 9,315,603 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
| 16.87 | 5000 | 21,335,068,150 | 9,518,476 | 1,294,194,541 | 30,297 | 958,609 | 1,563 |
- filled_k: The log2 from filled by poseidon step-circuit rows
Test 2
-
primary circuit - poseidon
-
secondary circuit - trivial
-
primary_circuit_k_table_size = k; varies for the experiment
-
secondary_circuit_k_table_size = k; varies for the experiment
-
commitment_key_sizе - 27 (8 GB * 2)
-
filled by primary circuit rows count - 24
-
filled by secondary circuit rows count - 0
| k | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
| --- | ------------- | -------------- | -------- | --------------- | ------- | --------------- |
| 17 | 46,470,334,145 | 8,384,081 | 26,415,304,923 | 15,860 | 958,633 | 1,566 |
| 18 | 66,329,954,655 | 13,391,175 | 27,059,130,587 | 15,860 | 958,633 | 1,566 |
| 19 | 103,941,909,225 | 22,422,684 | 28,346,781,915 | 15,860 | 958,633 | 1,566 |
| 20 | 177,592,778,430 | 39,557,793 | 30,922,084,571 | 15,860 | 958,633 | 1,566 |
| 21 | 323,176,881,971 | 71,974,399 | 36,072,689,883 | 15,860 | 958,633 | 1,566 |
NOTE: In this test, it is not the absolute values that are important, as they are skewed by the large COMMITMENT_KEY, but the nature of the growth in memory usage.
Test 3
- primary_circuit_k_table_size = 17
- secondary_circuit_k_table_size = 17
- commitment_key_sizе - 21
- filled by primary circuit rows count - 24
- filled by secondary circuit rows count - 0
| k | total_bytes | total_blocks | t_gmax | t_gmax_blocks | t_end | t_end_blocks |
| --- | ------------- | -------------- | -------- | --------------- | ------- | --------------- |
| 17 | 19,449,044,966 | 6,417,946 | 1,293,397,125 | 28,956 | 161,193 | 222 |
When table size k = 17 and commitment key size = 21, how do you get 19,449,044,966
which is ~20GB? The commitment key is just 256MB.
When table size k = 17 and commitment key size = 21, how do you get 19,449,044,966 which is ~20GB? The commitment key is just 256MB.
total_bytes: The total number of bytes allocated over the entire execution.
t_gmax: The number of bytes alive at the time when the heap size reached its global maximum.
Let me draw your attention to the difference. The total blocks
is the memory footprint, all the memory that was allocated (not at one moment)
The t_gmax
is how much memory was occupied at one time at peak.
Test 1: k=17
, key size is 21
. the t_gmax=1,294,194,541
which is 1.2 GB which makes sense.
Test 2: k=17
, key size is 27
(16GB), t_gmax=26,415,304,923
which is 24.6GB, after subtract the commitment key in memory which left us 8.6GB.
In both two tests, we have same circuit size but different commitment key. After we subtract the commitment key size from the memory, I would expect them to be similar size, but they are not. I don't understand why. Any idea about it? @cyphersnake
Any idea about it?
Don't forget about serialization of commitment key in PublicParams
Any idea about it?
Don't forget about serialization of commitment key in
PublicParams
The keys of bn256 and grumpkin are total 8GB+8GB=16GB. Why the increase of memory is 8GB but not 0GB or 16GB? How does the serialization of commitment keys internally work?