[Bug] Deploying programs with certain set of constraints causes OOM errors on validators
Closed this issue · 3 comments
https://hackerone.com/reports/2300725
Summary:
I wrote two programs, which cause client node to go offline due to memory pressure and OOM, it affected two different client nodes confirmed by Harukama. I reported this bug earlier, but now I can reliably reproduce it. zk_sra_encryption took the api.explorer.aleo.org client node offline and the aleo faucet client node and https://explorer.hamp.app/. I am not sure whether validators were affected, as it was accepted after about 10 minutes. The attack is asymmetric because my system did not struggle to broadcast the deployment expect for the latter which occasionally timed out.
A malicious user could fill up a wallet and schedule a cron to use snarkos to deploy the lesser expensive contract w/ a new name replaced by using sed or something similar in an automated fashion hypothetically.
A second program was understandably at the limit of the constraints for the network, zk_deck_shuffle_v0_0_1.aleo. It ultimately did not get accepted by validators despite having sufficient funds for storage. I was able to deploy it by not using u128 in my data structures, and instead substituting u128 for fields. Regardless both programs cause multiple downtime incidents and required service restarts for node operators.
I think the attack boils down to the fact that with arrays users can make use of u128 and ternary statements of conditions to generate 10-14k aleo opcodes which the network struggles to handle with about 500k constraints eating 1GB ram. This deck shuffle program was 24,000,000 constraints.
Both programs were valid and functionally correct.
To verify, change the name of the program and deploy it on testnet3.
Steps To Reproduce:
each repository's README.md contains example usage, not necessary to understand the issue at hand.
I keep an alias to deploy programs: alias aleo-deploy="snarkos developer deploy "$1" --path "./build" --private-key "${PRIVATE_KEY}" --query "https://api.explorer.aleo.org/v1" --broadcast "https://api.explorer.aleo.org/v1/testnet3/transaction/broadcast" --priority-fee 0"
zk_sra_encryption % aleo-deploy zk_sra_encryption_v0_0_2.aleo
📦 Creating deployment transaction for 'zk_sra_encryption_v0_0_2.aleo'...
✅ Created deployment transaction for 'zk_sra_encryption_v0_0_2.aleo'
✅ Successfully broadcast deployment at1395286dygpqqhlazadq5l2hn3x9jvtmfs78gv7jvknqwaapv5yzqmdvhk8 ('zk_sra_encryption_v0_0_2.aleo') to https://api.explorer.aleo.org/v1/testnet3/transaction/at1395286dygpqqhlazadq5l2hn3x9jvtmfs78gv7jvknqwaapv5yzqmdvhk8
And for zk_deck_shuffle_v0_0_1.aleo
zk_deck_shuffle % snarkos developer deploy "zk_deck_shuffle_v0_0_1.aleo" --path "./build" --private-key "${PRIVATE_KEY}" --query "https://api.explorer.aleo.org/v1" --broadcast "https://api.explorer.aleo.org/v1/testnet3/transaction/broadcast" --priority-fee 0
📦 Creating deployment transaction for 'zk_deck_shuffle_v0_0_1.aleo'...
✅ Created deployment transaction for 'zk_deck_shuffle_v0_0_1.aleo'
✅ Successfully broadcast deployment at1rux429axv4h0rgu4r33299gl9udwgd4fv2572klnejezdmhe7sgsgcvw39 ('zk_deck_shuffle_v0_0_1.aleo') to https://api.explorer.aleo.org/v1/testnet3/transaction/at1rux429axv4h0rgu4r33299gl9udwgd4fv2572klnejezdmhe7sgsgcvw39
The latter never got accepted from memoryPool, also when it was deployed the memoryPool API endpoint went down
Proof-of-Concept (PoC)
either of these cause the Denial of Service independently.
https://github.com/zkCohort/zk_sra_encryption/blob/5c0e86ed9b177e51d30dac978f1b1dcd29148ac7/build/main.aleo 22 credits to deploy
https://github.com/zkCohort/zk_deck_shuffle/blob/c2ddf86f68ca476c539cc47b2ae4cb35e9d954fc/build/main.aleo 450 credits to deploy
Supporting Material/References:
Impact
The attacker can cause hardship on node operators, who are attempting to keep their ledgers up to date while syncing transactions.
Hacker also requested that, when deploying, please leave the namespace available for subsequent updates (e.g. do not use the format zkbitwise_stack*.aleo)
This is a known issue, and there are pending PRs targeting it, e.g. https://github.com/AleoHQ/snarkVM/pull/2271.
We are now limiting the deployment to 1 << 20 constraints. Closing with https://github.com/AleoHQ/snarkVM/pull/2271.
Please reopen if this remains an issue even with the deployment restrictions.