Lots of Shapefile SQL Errors
esb opened this issue · 14 comments
I've been trying to load this datbase today, but I'm getting odd errors that I'm not sure about.
This is running on Centos 7 with Python 3.8.6 with Psycopg2 2.8.4 and Postgres 13.4 on x86_64-pc-linux-gnu and PostGIS 3.1.3 (with GEOS 3.9.1-CAPI-1.14.2).
I've downloaded the Geoscape Admin Boundaries - ESRI Shapefile - GDA2020(ZIP) file, which I'm hoping is the correct one.
The first error is
Importing ACT_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
This is true. The "geom" column does not exist in the table.
Then we launch into an awful lot of these errors.
root : WARNING Importing WA_TOWN_POINT_shp.dbf - Couldn't run Shapefile SQL
shp2pgsql result was: current transaction is aborted, commands ignored until end of transaction block
Is this because of that first error?
The (now updated) readme was missing a note to download the GDA94 Shapefile version of the Admin Bdys (not the GDA2020 ones)!
Please try again with the GDA94 files
I downloaded the GDA94 files, but the same errors are coming up.
Odd error! A couple of things to check:
- The PostGIS extension is enabled on your database (using the following SQL -
CREATE EXTENSION postgis
); and - The Postgres bin folder is in your $PATH (run shp2pgsql from the command line to confirm)
Yep. Both of those things have been done.
One more thing to double check - have you got the latest code? The August 2021 release PR was this morning, Sydney time, and there have been a couple of significant changes to the data (thanks Geoscape!)
Other than that - can you please share the full log with me and also the command line you're using to run it
Ok, my original hunch was right. The mass of errors is due to a locked transaction from the first failure.
I added the "-e" flag to the shp2pgsql command to treat each command individually, and not as part of a transaction.
After this was done most of the imports went smoothly, except for the one error for each state because of a missing column.
09/07/2021 07:04:49 PM Part 3 of 6 : Start raw admin boundary load : 2021-09-07 19:04:49.802582
09/07/2021 07:04:58 PM Importing ACT_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU... ^
09/07/2021 07:08:32 PM Importing NSW_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:08:56 PM Importing NT_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:10:46 PM Importing QLD_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:11:44 PM Importing SA_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:13:09 PM Importing TAS_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:15:10 PM Importing VIC_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
09/07/2021 07:16:32 PM Importing WA_TOWN_POINT_shp.shp - Couldn't run Shapefile SQL
shp2pgsql result was: column "geom" of relation "aus_town_point" does not exist
LINE 3: ..."town_point","date_creat","date_retir","town_pid",geom) VALU...
Interesting - the code should import SHP files of the same layer (e.g town points) sequentially, so PG shouldn't be locking the table. I'll go with it being a knock-on effect from the 1st failure... but could be the code isn't doing that...
A couple of more avenues to investigate re: the initial error:
- When a shapefile load fails at the psql stage (like the above error appears to have) - it should output a <failed_shp_file>.sql file in the same folder as the code - can you attach it to this issue
- Does this table exist in Postgres:
admin_bdys_202108.aus_town_point
and does it have the geom column?
The error file for error_debug_ACT_TOWN_POINT_shp.shp.sql is
SET CLIENT_ENCODING TO UTF8;
SET STANDARD_CONFORMING_STRINGS TO ON;
INSERT INTO "raw_admin_bdys_202108"."aus_town_point" ("town_point","date_creat","date_retir","town_pid",geom) VALUES ('TWP8373','20130731',NULL,'TWN8373','0101000020BB1000005DDB641F27A2624072956B2D33C141C0');
INSERT INTO "raw_admin_bdys_202108"."aus_town_point" ("town_point","date_creat","date_retir","town_pid",geom) VALUES ('TWP8374','20130731',NULL,'TWN8374','0101000020BB100000E707B0F32FA262407145A1E6BD9541C0');
INSERT INTO "raw_admin_bdys_202108"."aus_town_point" ("town_point","date_creat","date_retir","town_pid",geom) VALUES ('TWP8375','20130731',NULL,'TWN8375','0101000020BB1000009879612129A4624083647620CFA341C0');
ANALYZE "raw_admin_bdys_202108"."aus_town_point";
The table exists, but is missing the geom column.
Ok, so shp2pgsql worked, but the table missing the geom field is very strange...
I'm short of ideas right now as I've run this multiple times over the last few days on 2 Macs without that issue; and it hasn't come up in the last 3-4 years. I can't see why CentOS would be any different. Linux != Unix, but close enough...
A corrupt PostGIS install maybe?! Can you double check this SQL works in your DB:
select st_makepoint(151.0, -33.0, 4283);
Yep. That SQL works fine.
Here's the dump of the table schema.
CREATE TABLE IF NOT EXISTS raw_admin_bdys_202108.aus_town_point
(
gid integer NOT NULL DEFAULT nextval('raw_admin_bdys_202108.aus_town_point_gid_seq'::regclass),
town_point character varying(15) COLLATE pg_catalog."default",
date_creat date,
date_retir date,
town_pid character varying(15) COLLATE pg_catalog."default",
CONSTRAINT aus_town_point_pkey PRIMARY KEY (gid)
)
I'm not sure I understand where the table schemas come from.
The table is created by the first state to be loaded for each layer. So ACT_TOWN_POINT_shp.shp
should create the table aus_town_point
with that file's structure. The -d
flag in shp2pgsql forces a create table statement to be output with the SQL.
Another test! - uncomment line 379 in load-gnaf.py ([print(table) for table in create_list]
) and send through the full list of dicts. A working thought is that the code is behaving differently on CentOS and is putting ACT_TOWN_POINTS_shp.dbf
as the first aus_town_point
input...
There should be no .dbf files for layers that have a .shp file.
BTW - I had thought about refactoring the Shapefile load to use SHP > GeoPandas > PG, now I'm wishing I had :-)
I can see the problem....
This file is processed first -
shp2pgsql -e -d -I -G -n -i "/vagrant_data/geoscape_202108/AUG21_Admin_Boundaries_ESRIShapefileorDBFfile/TownPoints_AUG13_GDA94_SHP/Town Points/Town Points AUGUST 2013/Standard/TAS_TOWN_POINT_shp.dbf" raw_admin_bdys_202108.aus_town_point
Then this file comes afterwards -
shp2pgsql -e -a -s 4283 -i "/vagrant_data/geoscape_202108/AUG21_Admin_Boundaries_ESRIShapefileorDBFfile/TownPoints_AUG13_GDA94_SHP/Town Points/Town Points AUGUST 2013/Standard/TAS_TOWN_POINT_shp.shp" raw_admin_bdys_202108.aus_town_point
The dbf file creates the table, but this does not create the geom column. When the shp file is appended, it will crash.
So the problem is that a .dbf file exists for the aus_town_point table.
Yes. The order seems to be wrong.
TAS_GCCSA_2011_POLYGON_shp.dbf
TAS_GCCSA_2011_POLYGON_shp.shp
TAS_GCCSA_2011_shp.dbf
TAS_MB_2011_POLYGON_shp.dbf
TAS_MB_2011_POLYGON_shp.shp
TAS_MB_2011_shp.dbf
TAS_SA1_2011_POLYGON_shp.dbf
TAS_SA1_2011_POLYGON_shp.shp
TAS_SA1_2011_shp.dbf
TAS_SA2_2011_POLYGON_shp.dbf
TAS_SA2_2011_POLYGON_shp.shp
TAS_SA2_2011_shp.dbf
TAS_SA3_2011_POLYGON_shp.dbf
TAS_SA3_2011_POLYGON_shp.shp
TAS_SA3_2011_shp.dbf
TAS_SA4_2011_POLYGON_shp.dbf
TAS_SA4_2011_POLYGON_shp.shp
TAS_SA4_2011_shp.dbf
TAS_IARE_2011_POLYGON_shp.dbf
TAS_IARE_2011_POLYGON_shp.shp
TAS_IARE_2011_shp.dbf
TAS_ILOC_2011_POLYGON_shp.dbf
TAS_ILOC_2011_POLYGON_shp.shp
TAS_ILOC_2011_shp.dbf
TAS_IREG_2011_POLYGON_shp.dbf
TAS_IREG_2011_POLYGON_shp.shp
TAS_IREG_2011_shp.dbf
TAS_REMOTENESS_2011_POLYGON_shp.dbf
TAS_REMOTENESS_2011_POLYGON_shp.shp
TAS_REMOTENESS_2011_shp.dbf
TAS_SEIFA_2011_shp.dbf
TAS_SUA_2011_POLYGON_shp.dbf
TAS_SUA_2011_POLYGON_shp.shp
TAS_SUA_2011_shp.dbf
TAS_SOSR_2011_POLYGON_shp.dbf
TAS_SOSR_2011_POLYGON_shp.shp
TAS_SOSR_2011_shp.dbf
TAS_SOS_2011_POLYGON_shp.dbf
TAS_SOS_2011_POLYGON_shp.shp
TAS_SOS_2011_shp.dbf
TAS_UCL_2011_POLYGON_shp.dbf
TAS_UCL_2011_POLYGON_shp.shp
TAS_UCL_2011_shp.dbf
TAS_IARE_2016_POLYGON_shp.dbf
TAS_IARE_2016_POLYGON_shp.shp
TAS_IARE_2016_shp.dbf
TAS_ILOC_2016_POLYGON_shp.dbf
TAS_ILOC_2016_POLYGON_shp.shp
TAS_ILOC_2016_shp.dbf
TAS_IREG_2016_POLYGON_shp.dbf
TAS_IREG_2016_POLYGON_shp.shp
TAS_IREG_2016_shp.dbf
TAS_GCCSA_2016_POLYGON_shp.dbf
TAS_GCCSA_2016_POLYGON_shp.shp
TAS_GCCSA_2016_shp.dbf
TAS_MB_2016_POLYGON_shp.dbf
TAS_MB_2016_POLYGON_shp.shp
TAS_MB_2016_shp.dbf
TAS_SA1_2016_POLYGON_shp.dbf
TAS_SA1_2016_POLYGON_shp.shp
TAS_SA1_2016_shp.dbf
TAS_SA2_2016_POLYGON_shp.dbf
TAS_SA2_2016_POLYGON_shp.shp
TAS_SA2_2016_shp.dbf
TAS_SA3_2016_POLYGON_shp.dbf
TAS_SA3_2016_POLYGON_shp.shp
TAS_SA3_2016_shp.dbf
TAS_SA4_2016_POLYGON_shp.dbf
TAS_SA4_2016_POLYGON_shp.shp
TAS_SA4_2016_shp.dbf
TAS_REMOTENESS_2016_POLYGON_shp.dbf
TAS_REMOTENESS_2016_POLYGON_shp.shp
TAS_REMOTENESS_2016_shp.dbf
TAS_SEIFA_2016_shp.dbf
TAS_SOSR_2016_POLYGON_shp.dbf
TAS_SOSR_2016_POLYGON_shp.shp
TAS_SOSR_2016_shp.dbf
TAS_SOS_2016_POLYGON_shp.dbf
TAS_SOS_2016_POLYGON_shp.shp
TAS_SOS_2016_shp.dbf
TAS_SUA_2016_POLYGON_shp.dbf
TAS_SUA_2016_POLYGON_shp.shp
TAS_SUA_2016_shp.dbf
TAS_UCL_2016_POLYGON_shp.dbf
TAS_UCL_2016_POLYGON_shp.shp
TAS_UCL_2016_shp.dbf
tas_gccsa_2021.shp
tas_mb_2021.shp
tas_sa1_2021.shp
tas_sa2_2021.shp
tas_sa3_2021.shp
tas_sa4_2021.shp
TAS_COMM_ELECTORAL_POLYGON_shp.dbf
TAS_COMM_ELECTORAL_POLYGON_shp.shp
TAS_COMM_ELECTORAL_shp.dbf
TAS_STATE_ELECTORAL_POLYGON_shp.dbf
TAS_STATE_ELECTORAL_POLYGON_shp.shp
TAS_STATE_ELECTORAL_shp.dbf
tas_lga.shp
tas_localities.shp
TAS_STATE_POLYGON_shp.dbf
TAS_STATE_POLYGON_shp.shp
TAS_STATE_shp.dbf
TAS_LOCALITY_shp.dbf
TAS_LOCALITY_TOWN_shp.dbf
TAS_TOWN_POINT_shp.dbf
TAS_TOWN_POINT_shp.shp
TAS_TOWN_shp.dbf
Authority_Code_MB_CATEGORY_CLASS_AUT_shp.dbf
Authority_Code_REMOTENESS_CATEGORY_AUT_shp.dbf
Authority_Code_MB_CATEGORY_CLASS_AUT_shp.dbf
Authority_Code_REMOTENESS_CATEGORY_AUT_shp.dbf
Authority_Code_STATE_ELECTORAL_CLASS_AUT_shp.dbf
Authority_Code_TOWN_CLASS_AUT_shp.dbf
Cool, we've got the problem nailed down.
Curious that the behaviour is different between operating systems, but not as important as finding a universal fix...
I've just updated the code to filter out *_POINT_shp.dbf
files (line 340). However, in the list above the elif
filter should exclude all *_POLYGON_shp.dbf
files as well; so this may not be a fix on CentOS...