Reconsider DB schema
nilsnolde opened this issue · 4 comments
Things to think about:
- The tags are 1:1 with OSM objects, so they can be merged into one table
-
uuid
is mostly useless it seems, it's not the PK in most tables - categories are extracted and stored for each OSM object, duplicating data millions of times
- allow (simplified) MultiPolygon in DB. Would need some serious performance testing though for the
intersects
query stage. Would be nicer than only querying for centroids, which is often meaningless for big areas like parks or even true MultiPolygon objects.
From the top of my head: the tags are in a separate table by design. It's because the information can be optionally added and its columns can be individually queried. Having everything in one table would remove this functionality.
Ah ok, then you'd have to basically duplicate a lot of POIs if they have a lot of additional tags, right? Yeah, that'd be stupid, true. OK thanks for the hint!
It is like this:
public | ops_planet_pois | table | admin
public | ops_planet_pois_categories | table | admin
public | ops_planet_pois_tags | table | admin
ops_planet_pois
gis=# \d ops_planet_pois
Table "public.ops_planet_pois"
Column | Type | Collation | Nullable | Default
----------+-----------------------+-----------+----------+---------
uuid | bytea | | not null |
osm_id | bigint | | not null |
osm_type | integer | | not null |
geom | geography(Point,4326) | | not null |
Indexes:
"ops_planet_pois_pkey" PRIMARY KEY, btree (uuid)
"idx_ops_planet_pois_geom" gist (geom)
"ix_ops_planet_pois_osm_id" btree (osm_id)
Referenced by:
TABLE "ops_planet_pois_categories" CONSTRAINT "ops_planet_pois_categories_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)
TABLE "ops_planet_pois_tags" CONSTRAINT "ops_planet_pois_tags_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)
ops_planet_pois_categories
gis=# \d ops_planet_pois_categories
Table "public.ops_planet_pois_categories"
Column | Type | Collation | Nullable | Default
----------+---------+-----------+----------+--------------------------------------------------------
id | integer | | not null | nextval('ops_planet_pois_categories_id_seq'::regclass)
uuid | bytea | | not null |
category | integer | | not null |
Indexes:
"ops_planet_pois_categories_pkey" PRIMARY KEY, btree (id)
"ix_ops_planet_pois_categories_category" btree (category)
"ix_ops_planet_pois_categories_uuid" btree (uuid)
Foreign-key constraints:
"ops_planet_pois_categories_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)
ops_planet_pois_tags
gis=# \d ops_planet_pois_tags
Table "public.ops_planet_pois_tags"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+--------------------------------------------------
id | integer | | not null | nextval('ops_planet_pois_tags_id_seq'::regclass)
uuid | bytea | | not null |
osm_id | bigint | | not null |
key | text | | |
value | text | | |
Indexes:
"ops_planet_pois_tags_pkey" PRIMARY KEY, btree (id)
"ix_ops_planet_pois_tags_key" btree (key)
"ix_ops_planet_pois_tags_uuid" btree (uuid)
"ix_ops_planet_pois_tags_value" btree (value)
Foreign-key constraints:
"ops_planet_pois_tags_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)
Example:
gis=# SELECT * FROM ops_planet_pois LIMIT 5;
uuid | osm_id | osm_type | geom
------------------------------------+-----------+----------+----------------------------------------------------
\x25f48d87398c4d2fa3e59af27ea307ff | 110247790 | 1 | 0101000020E6100000C8330D2F2DB84EC006AB8F2C05E92840
\x56b20dd5d3e340d7ba663d12b0655848 | 108493272 | 1 | 0101000020E6100000317D4E8B0BB84EC07052C6095D0F2940
\xb6e64021629a43528d96874b4301e879 | 9829772 | 1 | 0101000020E61000007433A31F8D9D51C05BE48D8296C83340
\x92fe63810b3243c7a9a3c4d05c6f0c67 | 26241064 | 1 | 0101000020E6100000C4F59441DA9854C0ED707495EE203740
\x50f7f61cb96b4644bf97b00723aa0cbc | 26242036 | 1 | 0101000020E610000051488D646A9954C070C328081E033740
(5 rows)
...
gis=# SELECT * FROM ops_planet_pois_tags ORDER BY uuid LIMIT 5;
id | uuid | osm_id | key | value
----------+------------------------------------+------------+------------+-------------------
9445969 | \x000000d29f844abf9fb85b2d6d75290e | 4766271454 | name | Nasi Katok Adik
2371439 | \x0000022702df4af3a763b5aa140ab7ae | 1188977786 | wheelchair | yes
2371438 | \x0000022702df4af3a763b5aa140ab7ae | 1188977786 | name | Bistro Augenblick
12345299 | \x00000267e601418a98fc80bba38c5eeb | 1821133462 | wheelchair | yes
12345298 | \x00000267e601418a98fc80bba38c5eeb | 1821133462 | name | Simonet
(5 rows)
...
gis=# SELECT * FROM ops_planet_pois_categories ORDER BY uuid LIMIT 5;
id | uuid | category
----------+------------------------------------+----------
30877690 | \x0000001817504b0c87c3de5826105582 | 335
29785850 | \x0000008e91234c678cb229495fa55f6b | 565
18003787 | \x000000d29f844abf9fb85b2d6d75290e | 566
26977347 | \x000001c9f2ce4296842239e7e5cb67cf | 292
6606057 | \x0000022702df4af3a763b5aa140ab7ae | 570
(5 rows)
The uuid
is the FK for the other tables. Maybe there exists a more intuitive way but nothing came to our mind back then at least.
categories are extracted and stored for each OSM object, duplicating data millions of times
- each OSM object has one category or more than one, there are no duplicates in there.
e.g.
gis=# SELECT count(*) FROM ops_planet_pois_categories;
count
----------
38365931
(1 row)
gis=# SELECT count(*) FROM ops_planet_pois;
count
----------
38117690
(1 row)