blockchain-etl/ethereum-etl

Invalid rows being written in transaction_*.csv files

chrismclennon opened this issue · 3 comments

Earlier I tried running export_all.sh and am receiving rows that have too many columns. My guess is that the files aren't being properly handled, resulting in erroneous rows being written.

An example I'm seeing of an invalid row:

0x39a53f3f5d5c973134eb9eaecfed1a9d629ca5050bcd34e3ef249a242e539ac0,14310,0x18ed5bb58d11bb6215c0fa05601328f005227d0ed58dead0a006fed1d7812185,500xad9e48cd56d7dd148ac56124db4f247d32d9b3c71d8b36c759b2388cd80e1c3c,14336,0x3199bc051a2c83c911e7a9b78435aba005a25ad1bd3221f2289ee0ca016de379,500487,12,0xf8b483dba2c3b7176a3da549ad41a48bb3121069,0x0a5d9a6fd4688469c78478805ac190a2c6366778,230
08406140000000000,90000,50000000000,0x

It looks like it should be:

0x39a53f3f5d5c973134eb9eaecfed1a9d629ca5050bcd34e3ef249a242e539ac0,14310,0x18ed5bb58d11bb6215c0fa05601328f005227d0ed58dead0a006fed1d7812185,50

0xad9e48cd56d7dd148ac56124db4f247d32d9b3c71d8b36c759b2388cd80e1c3c,14336,0x3199bc051a2c83c911e7a9b78435aba005a25ad1bd3221f2289ee0ca016de379,500487,12,0xf8b483dba2c3b7176a3da549ad41a48bb3121069,0x0a5d9a6fd4688469c78478805ac190a2c6366778,230
08406140000000000,90000,50000000000,0x

It seems that transaction 0x39a53f3f5d5c973134eb9eaecfed1a9d629ca5050bcd34e3ef249a242e539ac0 got cut off and the next row started writing too early.

Thanks for reporting this. I'll try to reproduce it:

  1. What are the versions of OS and python?
  2. What are the parameters that you pass to export_all.sh?
  3. Can you reproduce it consistently or does it happen from time to time?
  4. Is it possible that on your machine somehow 2 export processes were running in parallel? E.g. if you killed one bash process, then started another one, but the old python process was still running in the background and wrote to the same file as the new process?

Thanks for checking it out, I think your idea in number 4 may actually be it.

  1. Python 3.5.2 running on Ubuntu 16.04.4 LTS
  2. nohup bash export_all.sh -s 0 -e 5843000 -b 100000 -i ~/.ethereum/geth.ipc -o output
  3. I'm reproducing the error consistently with different rows.
  4. Super good thinking! It looks like other python3 processes are running. I'll kill those, rerun, and report back.

Looks like that fixed it. Thanks for the help!