nopara73/WasabiVsSamourai

Include JoinMarket transactions

MaxHillebrand opened this issue · 15 comments

Can you figure out which transactions are coordinated in JoinMarket, and include them in the volume analysis?

Yes, however I would need to check the input values too I think, and Bitcoin transactions doesn't contain them. So right now this code within 10 minutes, if I add an RPC command for every input, it'd take days.

Wasabi txs are easy to identify from the outputs based on the coordinator address, Samourai transactions are also easy, because they use very specific amounts and number of input and outputs, but CJ can be more varied, so there in order to not create fake statistics, I'd need inputs, too.

I think this pattern could be tried for hunting down JM tx'es without checking input amounts:

  • "number of equally sized outputs" > 2 (could be 2 in theory too, but will give a lot of false positives likely and are rare)
  • "number of equally sized outputs" == ("number of other outputs" OR "number of other outputs" - 1)
  • equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)
  • "number of inputs" >= "number of outputs"
  • no more than one output with different address type than others and that must be one of equally sized ones and only following combinations then are allowed: n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32
  • also, number of equally sized outputs could be limited to 20 or 30, as more aren't practical, due to IRC server rate limits for privmsg's

But not sure how much false positives it will give from, for example, batch payouts from exchanges.

"number of equally sized outputs" > 2 (could be 2 in theory too, but will give a lot of false positives likely and are rare)

I disagree. My most common tx type in 2016 was that, due to larger ones were failing all the time. Also it'd get to the misleading territory if we wouldn't include them :/

equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)

What is this?

"number of inputs" >= "number of outputs"

Is this really correct? (1, 2.1, 3.2) -> (1, 1, 1, 1.1, 2.2)

Anyhow, this is still way too broad, it'd be full of with false positives. Although it doesn't look at txchains and tx inputs, so should be easily added, IMO it'd mislead more than it'd help :/

equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)

What is this?

I meant, value of equally sized output should not be below that.

Is this really correct? (1, 2.1, 3.2) -> (1, 1, 1, 1.1, 2.2)

Yes, when taker is doing sweep (sendpayment.py with 0 amount specified, which means all, so no change for himself), assuming zero cjfees, irl 2.1/1.1 and 3.2/2.2 should not be exact matches.

@kristapsk for the record I've implemented and ran your heuristics for a few thousand blocks before JoinMarket launch and it did not find a single false positive, even though I set the parameters to your weakest possible suggestions:

"number of equally sized outputs" > 2 (could be 2 in theory too, but will give a lot of false positives likely and are rare)

I used 2 here.

equally sized output amount above or equal to 0.001 BTC (or even 0.01 BTC, which is current default in joinmarket.cfg)

I used 0.001 here.

no more than one output with different address type than others and that must be one of equally sized ones and only following combinations then are allowed: n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32

I didn't bother with this.

also, number of equally sized outputs could be limited to 20 or 30, as more aren't practical, due to IRC server rate limits for privmsg's

I used 30 here.

Aaah, I forgot to turn off the flag that says to only look for txs after JM launch so my results above are not relevant. In fact I have a bunch of false positive after I turned off the flag.

2020-05-07 17:00:17 INFO        Scanner (173)   Block 390004, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:18 INFO        Scanner (173)   Block 390005, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:22 INFO        Scanner (173)   Block 390007, JM: 2, WW: 0, SW: 0
2020-05-07 17:00:28 INFO        Scanner (173)   Block 390011, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:31 INFO        Scanner (173)   Block 390013, JM: 2, WW: 0, SW: 0
2020-05-07 17:00:41 INFO        Scanner (173)   Block 390022, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:50 INFO        Scanner (173)   Block 390029, JM: 1, WW: 0, SW: 0
2020-05-07 17:00:56 INFO        Scanner (173)   Block 390033, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:00 INFO        Scanner (173)   Block 390036, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:10 INFO        Scanner (173)   Block 390043, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:36 INFO        Scanner (173)   Block 390065, JM: 1, WW: 0, SW: 0
2020-05-07 17:01:55 INFO        Scanner (173)   Block 390082, JM: 1, WW: 0, SW: 0
2020-05-07 17:02:05 INFO        Scanner (173)   Block 390087, JM: 1, WW: 0, SW: 0

I have also written some cj tx detection code in bash recently, but it also gives some false positives, biggest problem is the same as for you, I don't see amounts and types of inputs. But it's ok for me to monitor cj activity in recent blocks manually (there's script that allows me to do ./listpossiblecjtxids.sh $(bitcoin-cli getblockcount) for the most recent block, for example). https://github.com/kristapsk/bitcoin-scripts/blob/03f27cd1ae00813232ddc0f9d36a015c267fca33/inc.common.sh#L257

I don't see amounts and types of inputs

I do, I use Bitcoin Knots.

n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32

Forgive my ignorance, but why? Can't there be other combinations like n bech32, n witness script hash, n bech32 + 1 witness script hash and similar combinations?

Btw so far my progress is this:

https://github.com/nopara73/Dumplings/blob/b8fbbad0c9174a1cab2539669dc72047d3104546/Dumplings/Scanning/Scanner.cs#L173-L246

FTR everything after https://github.com/nopara73/Dumplings/blob/master/Dumplings/Scanning/Scanner.cs#L185 is this idea of yours:

n P2PKH, n P2PKH + 1 P2SH, n P2SH, n P2SH + 1 P2PKH, n P2SH + 1 bech32

As you can see it's pretty complex and I don't feel confident about it and especially don't think it's future proof. Anyhow I'll review your code and incorporate things if there is anything that you didn't already wrote in your first comment.

I don't see amounts and types of inputs

I do, I use Bitcoin Knots.

Without -txindex?

Can't there be other combinations like n bech32, n witness script hash, n bech32 + 1 witness script hash and similar combinations?

Not currently for JM, as it does not support equal value output coinjoins from native segwit wallets. So bech32 can only be a destination address of a taker. This may and likely will change in future.

Anyhow I'll review your code and incorporate things if there is anything that you didn't already wrote in your first comment.

First comment was about JoinMarket, my script tries to catch all coinjoins, not only JM. It's not directly related to this issue. :)

Without -txindex?

I don't know if it's needed. It's getblock with verbosity 3.

Btw, I just found this thing: https://content.sciendo.com/view/journals/popets/2018/4/article-p179.xml

It describes an algo for identifying JM txs in the appendix. I hope I find something interesting.

Now I'm even doing subsetsum, but there are a bunch of txs those you cannot even tell with your own eyes if they're JM txs or not, like this: https://www.smartbit.com.au/tx/5282615a41ef480f87f04c2558c70f451b4675f75c482c45dc7efbc82d4a626b

(This obviously isn't as it happened in 2011.)

I'm afraid one would need to spend weeks to make sure to catch all the JM transactions (without monitoring the orderbook of course.)

A good way to detect JoinMarket coinjoins is to check that they are directly connected to other JoinMarket coinjoins. This is the method used in the 2016 paper "Join Me on a Market for Anonymity by Malte Möser and Rainer Böhme."

This works because you'll only very very rarely see a JoinMarket coinjoin on its own not connected to any other coinjoin. The software creates coinjoins sequentially, and makers are generally incentivized to keep their bots running and repeatedly take part in many coinjoins.