cdrini/openlibrary

Creating postgres indices fails with duplicate key error

Closed this issue · 2 comments

Step 1b

If this came from me, the duplicate key was /books/OL9708313M which zgrep revealed to a) only occur once in the input file and b) occurred on a shard boundary when importing using 6 parallel shards.

zgrep -n /books/OL9708313M ol_dump_2019-02-28.txt.gz 
8681017:/type/edition	/books/OL9708313M	3	2010-04-14T05:59:33.019423 [...]

Since the problem doesn't occur when importing using a single shard and since the parallel import doesn't have much of a benefit, I consider this to be low priority.

Parallelization seems to have a bigger impact on OJF then it did on your system; easy enough to fix anyways :P Apparently tail -n +1 is 1-indexed. Fixed.