zack-bitcoin/amoveo

Transactions still dropping (intermittent)

Closed this issue ยท 31 comments

Tx sent to node via API spend, tx id is generated. Tx lost /not included in a block.

We had issues where broadcasting Amoveo transactions from our hot wallet, one immediately succeeded by the next one, resulted in only the last one being registered by the node.

All the transactions were responded with 200 status, but they were not included in the next blocks.

This has happened 3 times.

A workaround of waiting for 30 seconds between the broadcasts solved the issue.

Sykh commented

I can verify that it is happening again every now and then, i pause around 2s between every spend.

There was a bug in the "max spend" feature on the light node. I pushed a fix to github, but haven't updated my nodes yet. That should fix some cases of this problem.

Can I get more information about the other cases?
How badly are transaction filling up the mining pools? How many txs can fit in a block easily?

As @hantuzun stated we came across this issue at amoveo.exchange, this wasn't an issue up until recently.
Whenever we broadcast transactions one after another without any time delays between broadcasts, even though we get a 200 status from the node for every single transaction, 1 in x transactions actually reach the mempool where x isn't a number that would even come close to filling up an entire block.

PS: we are not using "max spend"

Yes, it seems this problem went into "remission" for some time, and has reared its ugly face again.

It reduces confidence in the network, exchanges, and mining pools, because users are unsure who to blame. It affects all of us, we should get this solved ASAP.

So is this only happening when you try to broadcast more than 1 tx?
If we switch to using multi-tx, does that fix it?

even if multi tx works, spends shouldnโ€™t have this behavior,

Zack - do you have enough information to recreate bug on your end?

I have not seen this bug happen at all.

When running tests to try to recreate, it may be good to have some test nodes not run 'stock'. Some nodes on our network may be modified, and the network should be robust if it's someone's modified node causing the issue.

One example (but not the only example) is this number in potential_block.erl https://github.com/zack-bitcoin/amoveo/blob/master/apps/amoveo_core/src/consensus/chain/potential_block.erl#L7 it should be assumed that some block-winning nodes have changed that number, and we should test to see if the bug can be reproduced in testing when it is changed

Sykh commented

It seems to be affecting my first broadcasted txs if it happens.

The pool always pays from big to small yet if i didnt mine the block myself so far only the bottom of my list got included even tho the other ones were added and thus (hopefully) broadcasted first.

But ofc it doesn't happen all the time :( plus i switched to a more irregular payout style now so we might not be able to track this anymore if it only happens to pools...

I second this bug be fixed first. I haven't been able to reproduce it constantly but I'll let you know if/when I do. Having this bug decreases trust in the system resulting in avoidance type behavior in sending txs.

This bug -> Less tx
This bug fixed -> More tx

What percentage of txs are being lost? Does this only happen if you make a tx immediately before a block is found?

In my experience, this bug happens whenever we broadcast transactions in batch. It's usually a batch of 4 to 6 transactions, and only the last one is being registered by the nodes.

Sykh commented

I have a 2s pause between my pool spends, they still do fail a few times per day.

On the other hand my node fails to spend anything 100%, not sure if it is related but it is something we can work with.

I tried to send VEO out via http://78.46.149.239:8080/wallet.html and it never happened, tried 3-4 times on different block heights but its perfectly in sync...

icook commented

Just wanted to chime in here with my experience from qTrade. This has given us plenty of headaches in our deposit sweeping and withdraw flows. I've encountered and manually fixed this problem 10-20 times since we added VEO. Symptoms are:

  • We generate several transactions in rapid succession spending from the same address, incrementing our own nonce counter (bulk processing 10 deposits at once and sweeping, for instance).
  • Submission to our own node, and several other public nodes via txs method succeeds.
  • In the next block a random subset of the sent transactions get included, and this frequently nonce-orphans some or all of the remaining transactions.
  • Our software now waits 2 blocks to ensure that the orphan-causing transaction has confirmed and re-initiates the sweep, effectively just re-sending the orphaned tx with a new higher nonce.

My best guess is the block assembler is randomly picking transactions and incidentally orphaning others in the process. I think this is why ethereum requires account nonce's to be consecutive, although there may be other ways to solve it. For now at least preferring the lowest known nonce for a given account would help us a lot, since I believe in most cases the pools have all our transactions in their mempool before block solve. Apologies if my assumptions are off, I haven't spend much time in the mining code.

I will spend some time trying to create a python replication script to demonstrate the problem.

@icook
It is a bad idea to make several txs at the same time.
It is better to use a multi-tx to bundle all your payments into one tx.
This way the network can't get mixed up and only include some of your txs.

Sykh commented

tbh this is a poor workaround no solution to the problem...consdering frequent discord messagers even single txs get dropped

you also save money on tx fees by grouping txs together with multi-tx.

Sykh commented

Unless you changed that later, no you dont.

Thats usually the main reason to use it and it was the main reason why i tested it on day 0, fee is very much the same.

governance fee is the same.
miner fee is less.

multi-tx uses less bytes per payment, so you can fit more payments in a block.

I made an api for generating multi-tx, there is no advantage in avoiding this nice feature.

Sykh commented

We tested it so we know the api...and you keep saying that its cheaper but it isnt
http://explorer.veopool.pw/?input=27399

If you don't use multi-tx, then a malicious miner could purposefully only include your high-nonce txs, to make all your low-nonce txs invalid.
It is in your interest to use multi-tx every time you are publishing more than one tx during a single block period.

@Sykh you are the one who decides how expensive miner fees are for including txs in your mining pool's blocks.
By default I think it is 1000 satoshis.

I heard that in ethereum all the nonces have to come sequentially, this it is impossible for this kind of error to happen.
Maybe we should do a hard update for this type of behavior.

Sykh commented

A malicous node / pool (not the miner) can drop your multi tx aswell, there is no connection here.

Stop dancing around the issue pls

1. The node IS dropping txs, it doesnt matter if you think multi tx is better or what could happen from malicous sources, its a bug, it shouldnt happen
2. multi-tx isnt cheaper
3. i can decide jack and 1000 satoshi mining fee isnt really the issue when you charge over 60k for every multi-tx entry

icook commented

@zack-bitcoin We would like to take advantage of multi-tx, but the implementation work required for us to use it is a really bad time/value trade off. Fundamentally operations between deposits and withdraws are isolated, and there's no good way to integrate them easily. You're correct that it would help alleviate this issue some, but it is impractical for us to implement.

We're also encountering this on the user side. A withdraw from amoveo.exchange got orphaned, and several of our users deposit transactions have as well, causing support headaches since this is confusing to users (they don't know why it failed, and think it's something broken on our end). I think it's quite common for users to send say, 10% of their VEO to each exchange. They rapidly send two deposits, but only one ever gets mined.

To say that to guarantee delivery of transactions you should only every send one per block is unworkably restrictive.

To shed some more light on our implementation, for sending transactions we currently:

  • Push the signed transaction to our local node, all public mining pools, and Zack's nodes.
  • Every 60 seconds we scan for a new block, and if it's not confirmed we re-push the transaction.

This means the average transaction we send is being pushed to multiple public nodes 10-30 times before its confirmed. It's honestly been on my todo list to scale it back because this is a bit more spammy than I'd like, but even still we get orphaned transactions. With bitcoin and similar currencies I've only ever needed to push it to a single node.

If you do decide to fork to consecutive nonces-only please try to give us at least 1 week to adapt our integration. There are certain changes we'll need to make and test, and they take time.

icook commented

Just put a bit more work into my reproduction script, figured I'd share the findings:

  • Sending 10 consecutive transactions (nonce 1-10) from a single node, then manually triggering tx exchange and mining a single block appears to work correctly every time, all 10 are mined in the next block.
  • Sending those same transactions in reverse order results in an empty mempool, and only the very lowest nonce transaction is mined into the next block.
  • Sending the same transactions in random order results in a subset of them being accepted.
Sending transactions in nonce order [6, 8, 4, 7, 2, 3, 1, 5, 9, 10]
Sending to node 0
Sending to node 1
Sending to node 2
All mempools ======
Node 0; [1, 5] nonces
Node 1; [] nonces
Node 2; [] nonces
2018-09-05 15:28:06,240 [node_0         ] [INFO ] Mined block, height at 11, balance at 110.02175686
2018-09-05 15:28:06,389 [network        ] [INFO ] Nodes consensus state is [10, 11], full blocks
2018-09-05 15:28:06,557 [network        ] [INFO ] Nodes consensus state is [11], full blocks
All mempools ======
Node 0; [] nonces
Node 1; [] nonces
Node 2; [] nonces
  • A two second pause between pushing sending txs doesn't seem to help, there's still only a subset being accepted when not pushed in ascending order.

My hunch is that sorting mempools in nonce order before sending in https://github.com/zack-bitcoin/amoveo/blob/master/apps/amoveo_core/src/consensus/sync.erl#L347 may reduce dropped transactions. You may also be able to create a nonce grace range, where nonces of up to account_nonce + 10 will be retained in the mempool.

Just some thoughts, I'm still exploring root cause.

The nonces are already being sent in order.

Sharing txs too fast could cause this problem.
If you share all your txs with Bob, up to tx n-1.
Then you make a new tx N, and you rapidly send it to everyone.
Alice hasn't received your old txs from Bob, if she receives your new tx first in the rapid fire, then she will ignore all your old txs.

This has been fixed right?
I haven't had a problem or had anyone complain about this in a long time.

Sykh commented

Not completly, it became much more rare tho.