Creating postgres indices fails with duplicate key error
Closed this issue · 2 comments
cdrini commented
Step 1b
tfmorris commented
If this came from me, the duplicate key was /books/OL9708313M
which zgrep revealed to a) only occur once in the input file and b) occurred on a shard boundary when importing using 6 parallel shards.
zgrep -n /books/OL9708313M ol_dump_2019-02-28.txt.gz
8681017:/type/edition /books/OL9708313M 3 2010-04-14T05:59:33.019423 [...]
Since the problem doesn't occur when importing using a single shard and since the parallel import doesn't have much of a benefit, I consider this to be low priority.
cdrini commented
Parallelization seems to have a bigger impact on OJF then it did on your system; easy enough to fix anyways :P Apparently tail -n +1
is 1-indexed. Fixed.