norBIT/alkisimport

Weird error while running parallel import of Rheinland-Pfalz data

Closed this issue · 9 comments

Hey, I'm running import of the ALKIs data Rheinland-Pfalz in parallel, and I'm getting these errors.

Do you know what could be the cause of them?

Below you have config_file that I'm using:

epsg 25832
schema alkis_rheinland_pfalz
historie off
jobs 6
avoiddupes off
update
debug off
log /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/import_logs/2024_06_06_085030/batch1_import_log_file.log
/mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1001_Huemmel.xml
…
…
…

And logs:

 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    ERROR: Ergebnis 1 bei /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml (bislang 1 Fehler)
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    TIME: /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml mit 145MiB in 4m21s importiert (570kiB/s; Gesamt:1248kiB/s).
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    REMAINING: 36GiB 71% 8h30m47s ETA:Thu Jun  6 20:42:12 CEST 2024
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    ERROR 1: CREATE TABLE "alkis_rheinland_pfalz"."ax_besonderergebaeudepunkt" ( "ogc_fid" SERIAL, PRIMARY KEY ("ogc_fid"), "gml_id" VARCHAR(16), "anlass" varchar[], "beginnt" VARCHAR, "endet" VARCHAR, "advstandardmodell" varchar[], "sonstigesmodell" varchar[], "quellobjektid" VARCHAR, "zeigtaufexternes_art" varchar[], "zeigtaufexternes_name" varchar[], "zeigtaufexternes_uri" varchar[], "art" INTEGER, "hoehe" FLOAT8[], "obererbezugspunkt" INTEGER[], "untererbezugspunkt" INTEGER[], "punktkennung" VARCHAR, "sonstigeeigenschaft" varchar[], "zustaendigestelle_land" VARCHAR, "zustaendigestelle_stelle" VARCHAR, "bestehtaus" varchar[], "istteilvon" varchar[] )
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    ERROR 1: Unable to write feature 1 from layer ax_besonderergebaeudepunkt.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    ERROR 1: Terminating translation prematurely after failed
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml    translation of layer ax_besonderergebaeudepunkt (use -skipfailures to skip errors)
 parallel: This job failed:
 import /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1227_Monreal.xml
 parallel: Starting no more jobs. Waiting for 5 jobs to finish.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       ERROR: Ergebnis 1 bei /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml (bislang 2 Fehler)
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       TIME: /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml mit 63MiB in 4m26s importiert (246kiB/s; Gesamt:1253kiB/s).
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       REMAINING: 36GiB 71% 8h27m54s ETA:Thu Jun  6 20:39:24 CEST 2024
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       ERROR 1: CREATE TABLE "alkis_rheinland_pfalz"."ax_besonderergebaeudepunkt" ( "ogc_fid" SERIAL, PRIMARY KEY ("ogc_fid"), "gml_id" VARCHAR(16), "anlass" varchar[], "beginnt" VARCHAR, "endet" VARCHAR, "advstandardmodell" varchar[], "sonstigesmodell" varchar[], "quellobjektid" VARCHAR, "zeigtaufexternes_art" varchar[], "zeigtaufexternes_name" varchar[], "zeigtaufexternes_uri" varchar[], "art" INTEGER, "hoehe" FLOAT8[], "obererbezugspunkt" INTEGER[], "untererbezugspunkt" INTEGER[], "punktkennung" VARCHAR, "sonstigeeigenschaft" varchar[], "zustaendigestelle_land" VARCHAR, "zustaendigestelle_stelle" VARCHAR, "bestehtaus" varchar[], "istteilvon" varchar[] )
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       ERROR 1: Unable to write feature 1 from layer ax_besonderergebaeudepunkt.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       ERROR 1: Terminating translation prematurely after failed
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml       translation of layer ax_besonderergebaeudepunkt (use -skipfailures to skip errors)
 parallel: This job failed:
 import /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1226_Reudelsterz.xml
 parallel: Starting no more jobs. Waiting for 4 jobs to finish.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1228_Kehrig.xml     TIME: /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1228_Kehrig.xml mit 144MiB in 7m5s importiert (349kiB/s; Gesamt:1246kiB/s).
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1228_Kehrig.xml     REMAINING: 36GiB 71% 8h28m50s ETA:Thu Jun  6 20:43:28 CEST 2024
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     ERROR: Ergebnis 1 bei /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml (bislang 3 Fehler)
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     TIME: /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml mit 92MiB in 5m56s importiert (265kiB/s; Gesamt:1253kiB/s).
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     REMAINING: 36GiB 71% 8h24m28s ETA:Thu Jun  6 20:39:07 CEST 2024
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     ERROR 1: CREATE TABLE "alkis_rheinland_pfalz"."ax_besonderergebaeudepunkt" ( "ogc_fid" SERIAL, PRIMARY KEY ("ogc_fid"), "gml_id" VARCHAR(16), "anlass" varchar[], "beginnt" VARCHAR, "endet" VARCHAR, "advstandardmodell" varchar[], "sonstigesmodell" varchar[], "quellobjektid" VARCHAR, "zeigtaufexternes_art" varchar[], "zeigtaufexternes_name" varchar[], "zeigtaufexternes_uri" varchar[], "art" INTEGER, "hoehe" FLOAT8[], "obererbezugspunkt" INTEGER[], "untererbezugspunkt" INTEGER[], "punktkennung" VARCHAR, "sonstigeeigenschaft" varchar[], "zustaendigestelle_land" VARCHAR, "zustaendigestelle_stelle" VARCHAR, "bestehtaus" varchar[], "istteilvon" varchar[] )
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     ERROR 1: Unable to write feature 1 from layer ax_besonderergebaeudepunkt.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     ERROR 1: Terminating translation prematurely after failed
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml     translation of layer ax_besonderergebaeudepunkt (use -skipfailures to skip errors)
 parallel: This job failed:
 import /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1236_Allenz.xml
 parallel: Starting no more jobs. Waiting for 2 jobs to finish.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1238_Hausen.xml     TIME: /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1238_Hausen.xml mit 133MiB in 8m28s importiert (269kiB/s; Gesamt:1233kiB/s).
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1238_Hausen.xml     REMAINING: 36GiB 70% 8h30m55s ETA:Thu Jun  6 20:50:46 CEST 2024
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml ERROR: Ergebnis 1 bei /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml (bislang 4 Fehler)
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml TIME: /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml mit 56MiB in 9m12s importiert (104kiB/s; Gesamt:1238kiB/s).
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml REMAINING: 36GiB 70% 8h28m17s ETA:Thu Jun  6 20:48:09 CEST 2024
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml ERROR 1: CREATE TABLE "alkis_rheinland_pfalz"."ax_besonderergebaeudepunkt" ( "ogc_fid" SERIAL, PRIMARY KEY ("ogc_fid"), "gml_id" VARCHAR(16), "anlass" varchar[], "beginnt" VARCHAR, "endet" VARCHAR, "advstandardmodell" varchar[], "sonstigesmodell" varchar[], "quellobjektid" VARCHAR, "zeigtaufexternes_art" varchar[], "zeigtaufexternes_name" varchar[], "zeigtaufexternes_uri" varchar[], "art" INTEGER, "hoehe" FLOAT8[], "obererbezugspunkt" INTEGER[], "untererbezugspunkt" INTEGER[], "punktkennung" VARCHAR, "sonstigeeigenschaft" varchar[], "zustaendigestelle_land" VARCHAR, "zustaendigestelle_stelle" VARCHAR, "bestehtaus" varchar[], "istteilvon" varchar[] )
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml ERROR 1: Unable to write feature 1 from layer ax_besonderergebaeudepunkt.
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml ERROR 1: Terminating translation prematurely after failed
 /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml translation of layer ax_besonderergebaeudepunkt (use -skipfailures to skip errors)
 parallel: This job failed:
 import /mnt/f/project_germany_100/rlp/alkis/ALKIS_Importer_Pipeline/input_batched/batch1/bda_5750_1237_Berresheim.xml

Why avoiddupes off? I suppose there's duplicate data in the sets (at least AX_Bundesland). BTW better diagnostics (ie. values of conflicting records) with usecopy off

AX_Bundesland is a parameter in config file?

AX_Bundesland is a parameter in config file?

No a conflicting record that probably appears in each set and will cause an error unless avoiddupes is on.

I don't know, actually I thought that avoiddupes off only raise warnings, and it will be nice to know if there are duplicates or not, but it's actually throwing errors and stopping the import.

So is it good practice to use avoiddupes on as I understood correctly? I mean is there any downsides of using it?

Depends, avoiddupes should not be necessary for normal deliveries, which shouldn't contain any duplicates - if they still contain duplicates then there should be a problem in the data.

In this case each zip is a separate delivery. Importing them into separate schemas should work without error and shouldn't need avoiddupes. But you want to import them into one schema and hence the objects that reappear in several datasets will produce errors without avoiddupes (eg. AX_Bundesland, AX_KreisRegion; maybe also other objects that intersects at the edges of the delivery). Hopefully in this case all data is from the same point in time, because otherwise you might find several versions of the same object in different deliveries (ie. same gml_id different beginnt), that would not violate any constraints, but might produce strange output later.

Thank you for your detailed explanation. However problem still exists, even when I have changed to avoiddupes to on. Error seems exactly the same.

Data got gid7.1, I'm using gid7 branch - is that fine? Maybe I can try usecopy off to have better diagnostics?

image

In addition to previous comment from above, I did run again and turn off PG_USE_COPY, so I can share here how the logs look like with usecopy off

logs_with_usecopy_off.log

avoiddupes only works for already inserted rows. If a the same record appears in parallel transactions, none will see the other, in turn no inserts will be skip it and the conflict will arise, when the second transaction is committed.

So I'm guessing only solution to avoid this situation is run with jobs 1?

Or is it possible to use parallel jobs and use skipfailures?