zack-bitcoin/amoveo

more efficient merkel tree database

Closed this issue · 2 comments

summary of efficiency improvements

generating a merkel proof or storing updates will take the same amount of time.
the merkel proof will be about 1/2 as long, meaning we can store 2x more tx per block
This will make our hard drive requirements 1/2 as big

changes

update the merkel tree database to be and order 2 radix tree, and group elements into pages for speed and efficiency.

generating proofs and storing updates

  • old number of pages to read 2 * log(#accounts)/log(16) ((the 2 is because we eventually garbage collect the trees))
  • new number of pages to read log(#accounts)/log(page_size/32)
  • ratio is log(16)/log(4000/32)/2 which is about 1/3rd as many reads.

size of merkel proofs

The length of an individual merkel proof will be
2 * (108 + 32 * 2 * log2(number of users))
8*log(2)/log(16) is a factor of 2.
So merkel proofs will be 1/2 as big.
This means we can store twice as many txs per block

Hard drive requirement

We are currently using 2 * (108 + (32 * 16 * log16 (number of users))) bytes of hard drive per spend tx, and more for other types.
With this update we could get it down to 2 * (108 + 32 * 2 * log2(number of users)) bytes per spend tx.

An improvement factor of (108 + (32 * 16 * log16(#users))) / (108 + (32 * 2 * log2(#users)))

At 1 million users, we will be using about 1/2 as much hard drive space to store the merkel proofs.

I did a test on my hard drive to determine what would be the optimal page size, and I got a very interesting result:
{page size in bytes, estimate of how long it takes to look up or store/read 1 thing in the merkel tree}
[{32,8.453334439183424},
{64,8.415022028805112},
{128,8.403737498044052},
{256,8.419414968497533},
{512,8.380814893055621},
{1024,8.325510455185333},
{2048,8.312204157560018},
{4000,8.327605298630237},
{4004,8.335311537850668},
{4006,8.375737628174566},
{4008,8.347227485731853},
{4010,8.409663658473571},
{4016,8.346456767072695},
{4096,8.46699140637872},
{4097,8.453567586776897},
{4200,8.435250763750128},
{6000,8.470501137174027},
{8000,8.490136693298721},
{12000,8.557693134083362},
{16000,8.648457171653046}

64 bytes is the smallest we size we could do.
Everything from 64 bytes to 16kilobytes takes about the same amount of time on my hardware.

This means that we can't change the speed by changing the page size.

So it looks like we should abandon the goal of having variable page size.
And we already showed that at least for the short term, moving the proofs from the blocks into the merkel tree makes it more expensive, not less.

The only update we should consider for the short term:

  1. internally we should calculate hashes so it is a binary merkel tree, not an order 16 merkel tree. This way merkel proofs are 1/2 as long, and we can fit twice as many txs per block.