Mining protocol changes to combat pool centralization
tevador opened this issue · 16 comments
Monero mining has been centralized to 2-3 large pools for a long time (recently, the largest pool even briefly exceeded 50% of the hashrate). This unhealthy situation is not going to get fixed by itself. It needs a protocol-level solution.
I don't think we should implement radical changes against pool mining like Wownero did, but I'm proposing a relatively small change to make pooled mining more difficult.
The RandomX dataset is constructed from a 256-MiB cache. This cache is expanded from a 32-byte block ID using Argon2 every 2048 blocks (~3 days).
I'm proposing the following changes:
- Construct the cache by selecting random parts of the blockchain instead of using Argon2.
- Reseed the cache every 64 blocks instead of every 2048 blocks.
The portion of the blockchain that can be used is roughly ~10 GiB (everything except of prunable data - ring signatures and range proofs). The parts that are used to construct the cache could be selected pseudorandomly based on the seed block ID.
The effect of this change on the network nodes would be negligible. They already have all the data needed to construct the cache.
However, this would greatly increase the bandwidth requirements for centralized pools and their miners. Miners would either have to run their own nodes or pools would have to provide the 256 MiB cache to every miner every 2 hours. Interestingly, both of these solutions have roughly the same bandwidth requirements of ~3 GB/day (compared to ~1 MB/day that miners use currently),
If miners are forced to run their own nodes, they can find it more convenient to mine on p2pool rather than using a centralized pool.
For what it is worth, I fully support this proposal to mitigate pool centralization. Anything that helps us boost the overall full node count, is a net positive towards decentralization; if to that we add the fact that the sometimes crazy network hashrate swings are mitigated to prevent any given pool from attaining more than 51% of total network hashrate, that is just the icing on the cake.
Keen to hear others' feedback and discuss it further. Take my support at face value.
Sounds like a good idea.
It would probably kill off quite a few botnet's plus all those who don't feel like maintaining a 2 node mining setup.
I wonder how much hashrate that amounts to.
I want to push p2pool use, though I would prefer to see guides for p2pool improve further before something like this is implemented. To point a miner at a mining pool, all one needs to do is find a pool and run a program.
I notably would like a reliable program where 1 click/command would set up the p2pool server and mine (and set up a monero node or use a different one), so that the UX is exactly the same as running a mining program alone. I would like this as both a separate program and in the GUI.
Many users find Monero because it is easy to mine. Should we add roadblocks in the way of that, I want to make sure they are appropriately addressed. Monero should remain one of the most accessible mining routes for novices.
@SamsungGalaxyPlayer , it seems that the GUI implementation of p2pool has happened.
Many users find Monero because it is easy to mine. Should we add roadblocks in the way of that, I want to make sure they are appropriately addressed.
Monero will still be easy to mine. Mining pools would still exist. They would just have higher fees to compensate for the higher data usage.
And yeah, I'm a fan of this proposal. I've always been a fan of this boolberry-type thing. I think they called it wild-keccak?
awesome awesome.
I guess pools will enforce stricter policy on proxy usage to minimize effect of this change (if I understood everything correctly here of course).
this was discussed at a recent MRL, well, mostly after the meeting. It seems the bandwidth costs for pools are negligible, so this would really just be a nuisance for larger pools and end up preventing smaller pools from surviving.
the estimate is that minexmr pulls in like $12k. So if they have to spend ~10k in bandwidith, they'll just raise their fee to 1.5% and pull in $18k in fees.
also, i would imagine digital ocean is on the high end for bandwidth costs
https://www.leaseweb.com/dedicated-servers/high-bandwidth-server#NL
additionally, the larger pools could somehow enforce or encourage use of proxies.
maybe its enough?
I hope this goes through. The only issue with Monero right now is the power minexmr has.
I have implemented a proof-of-concept blockchain data selector. It turns out that selecting random blockchain data is more complicated than I thought when writing this proposal.
Firstly, we should clarify the performance targets. If we want to limit the cache lifespan to just 64 blocks, the generation time must definitely be under 1 second. Since a RandomX hash in light mode takes about 15 ms to calculate, a cache generation time of 1000 ms amortized over 64 blocks would be adding additional 15-16 ms, doubling the PoW cost of the initial blockchain download (IBD). Since the cost of IBD is dominated by non-PoW calculations, this might still be acceptable. The good news is that it would not affect the PoW costs of block verification for a synchronized node.
Looking at the blockchain database scheme, I came to the conclusion that only two tables can be used: blocks
and txs_pruned
. The rest are either lookup tables or tables containing data that is not the same for all network nodes. Additionally, blocks
and txs_pruned
both use a sequential integer key, which makes it easy to select data at random.
To make the I/O cost of selecting data as favorable as possible, the data is read mostly sequentially in batches of 64 blocks or 64*N transactions, where N
on average equals the mean number of transactions per block. Most of the database accesses use the MDB_NEXT
cursor operation, with an occasional MDB_SET
operation at the start of each batch. Block IDs and transaction IDs are selected pseudorandomly from the set starting with the genesis block and ending with the seed block. The seed block hash is used to seed the PRNG. The selection algorithm has a small bounded bias towards selecting data from recent blocks to make it unlikey that a cache can be constructed with stale blockchain data.
I tested several different selection techniques and different sets of parameters, but the method I just described seems to be close to optimal from a performance standpoint, while still providing significant randomization.
To further boost performance, the cache can be split into several segments and each segment can be initialized independently by a different thread. This can speed up the cache construction significantly when using an NVMe SSD to store the blockchain.
Finally, here are the performance numbers. I used a blockchain database that was obtained by running a node with --prune-blockchain
and syncing the blockchain. All tests are for the worst-case scenario of a cold disk cache.
storage hardware | 1 thread | 8 threads |
---|---|---|
NVMe SSD | 15s | 2.4s |
7200 RPM HDD | 11min 30s | 13min 27s |
None of the hardware configurations came close to the 1 second performance target. The HDD numbers are especially brutal. But why is it so slow? We are only reading ~256 MB from the disk, aren't we?
Actually, the total amount of data read from the disk is over 2 GB! This is due to the way the data is stored in the LMDB database file. Normally, the data from all tables is interleaved as the database grows, which leads to significant fragmentation of the blocks
and txs_pruned
tables.
We can confirm this theory by using the command ./monero-blockchain-prune --copy-pruned-database
. This builds a copy of the pruned database with the data from each db table moved together.
With this "defragmented" database file, the performance is much better:
storage hardware | 1 thread | 8 threads |
---|---|---|
NVMe SSD | 3s | 0.5s |
7200 RPM HDD | 51s | 57s |
So the only configuration that can meet the initial performance target is using an NVMe SSD, 8 threads and a defragmented database file. These are not realistic conditions for a typical network node.
The conclusion is that without having a dedicated database file just for the cache selector (which would increase the required disk space to run a pruned node by ~25%), this proposal is probably not viable.
After some discussion with @hyc on IRC, it seems that the low performance of the data selector is not caused by the database file fragmentation per se. The problem might be due to a mismatch between the LMDB page size and the page size of the storage device.
LMDB uses a page size equal to the virtual memory page, which is 4K on most systems. However, modern SSDs use larger pages of 8K-64K depending on the type of flash memory. Since the page is the smallest unit an SSD can read, a request to read a 4K LMDB page will actually load 8 pages into memory. The other 7 pages will most likely contain data from other tables, which results in the "read amplification" effect that I observed with the original blockchain database.
To confirm this theory, I applied the following patch to LMDB to increase the page size to 32K:
diff --git a/external/db_drivers/liblmdb/mdb.c b/external/db_drivers/liblmdb/mdb.c
index bf60c7013..d41c08d79 100644
--- a/external/db_drivers/liblmdb/mdb.c
+++ b/external/db_drivers/liblmdb/mdb.c
@@ -362,7 +362,7 @@ typedef HANDLE mdb_mutex_t, mdb_mutexref_t;
#define MDB_FDATASYNC(fd) (!FlushFileBuffers(fd))
#define MDB_MSYNC(addr,len,flags) (!FlushViewOfFile(addr,len))
#define ErrCode() GetLastError()
-#define GET_PAGESIZE(x) {SYSTEM_INFO si; GetSystemInfo(&si); (x) = si.dwPageSize;}
+#define GET_PAGESIZE(x) ((x) = 32768)
#define close(fd) (CloseHandle(fd) ? 0 : -1)
#define munmap(ptr,len) UnmapViewOfFile(ptr)
#ifdef PROCESS_QUERY_LIMITED_INFORMATION
@@ -468,7 +468,7 @@ typedef pthread_mutex_t *mdb_mutexref_t;
* This is the basic size that the platform's memory manager uses, and is
* fundamental to the use of memory-mapped files.
*/
-#define GET_PAGESIZE(x) ((x) = sysconf(_SC_PAGE_SIZE))
+#define GET_PAGESIZE(x) ((x) = 32768)
#endif
#define Z MDB_FMT_Z /**< printf/scanf format modifier for size_t */
I synced the blockchain again from the network and ran the same benchmarks as before:
storage hardware | 1 thread | 8 threads |
---|---|---|
NVMe SSD | 2.7s | 0.44s |
7200 RPM HDD | 2 min 13s | 2min 18s |
The performance of the SSD matches the results with the defragmented database, so I estimate that the hardware page size is 32K. However, I cannot explain why the HDD also got faster (HDDs usually use a sector size of 4K).
Hi! I propose a different and (I hope) much more effective approach to defeat Monero mining centralization problem. I believe the mining protocol changes make a bad impact on the honest miners, while large centralized pools will find workaround cheap enough, given their money resources.
As you remember, when the largest centralized Monero mining pool called MineXMR.com disappeared, it did recommend for all the miners to migrate to decentralized p2pool.
Nevertheless today we observe exactly the same picture when MineXMR pool was in the scene: centralized Nanopool or SupportXMR pool get/share the place of MineXMR (nearly 1GH/s) in the overall pools' hashrate table (just see https://miningpoolstats.stream/monero). Decentralized p2pool grows a little bit and still under 200MH/s.
I was observing that sad picture for a while, and finally... I have implemented MineXMR2 pool:
MineXMR2 is completely open-source old-school Monero mining pool, resembling its predecessor MineXMR. No registration, no spying cookies, no personal data collection (even email).
Meanwhile MineXMR2 is a brand new in its idea to use decentralized p2pool as a hashrate-liquidity provider.
Indeed, MineXMR2 implementation (https://github.com/minexmr2/minexmr2, https://gitlab.com/minexmr/minexmr2) is based on https://github.com/jtgrassie/monero-pool. But I have rewritten/add more than 50% of code to redirect all the miners' hashrate to the internal p2pool instance. As a result, MineXMR2 performs all the PPLNS billing and make payout in useful 0.01XMR chunks, while p2pool does generate all the jobs for miners, and receives typical p2pool "dust" payouts in 0.0003-0.0005XMR chunks.
Thus, MineXMR2 seems to be centralized pool in some aspects, but actually its hashrate is completely redirected and contributed to decentralized p2pool. All the newbies that want to support p2pool can connect to MineXMR2 and get power of p2pool immediately, without having to build, deploy and maintain their own p2pool instance.
I ask you to popularize this open source solution amid all the pool owners, as they CAN use my open source pool implementation utilizing decentralized p2pool.
Also I believe there are methods how can I prove MineXMR2 is indeed running open source code. Please give me the links on how to establish and deploy that evidence. Currently one can check stratum protocol outputs to be the same as for p2pool v2.3.
P.S. I am NOT affiliated in any way with MineXMR pool's onwer. I am completely different person, known in bitcointalk.org as florida.haunted for many years.
Your "solution" doesn't remove the pool operator's ability to use the pool hashrate for malicious purposes. So it doesn't actually solve anything.
Your "solution" doesn't remove the pool operator's ability to use the pool hashrate for malicious purposes. So it doesn't actually solve anything.
I believe there are methods how can I prove MineXMR2 is indeed running open source code. Please give me the links on how to establish and deploy that evidence. Currently one can check stratum protocol outputs to be the same as for p2pool v2.3.
prove MineXMR2 is indeed running open source code
Such a proof would be of little value. Since the pool admin is in control of the server, they can deploy malicious code at any time to hijack the miners connected to the pool. All centralized pools have this problem. Some mitigations might be possible in mining software to check that the block template given by the pool references the tip of the public chain.