GIScience/openpoiservice

Reconsider DB schema

nilsnolde opened this issue · 4 comments

Things to think about:

  • The tags are 1:1 with OSM objects, so they can be merged into one table
  • uuid is mostly useless it seems, it's not the PK in most tables
  • categories are extracted and stored for each OSM object, duplicating data millions of times
  • allow (simplified) MultiPolygon in DB. Would need some serious performance testing though for the intersects query stage. Would be nicer than only querying for centroids, which is often meaningless for big areas like parks or even true MultiPolygon objects.

From the top of my head: the tags are in a separate table by design. It's because the information can be optionally added and its columns can be individually queried. Having everything in one table would remove this functionality.

Ah ok, then you'd have to basically duplicate a lot of POIs if they have a lot of additional tags, right? Yeah, that'd be stupid, true. OK thanks for the hint!

It is like this:

 public | ops_planet_pois            | table | admin
 public | ops_planet_pois_categories | table | admin
 public | ops_planet_pois_tags       | table | admin

ops_planet_pois

gis=# \d ops_planet_pois
                  Table "public.ops_planet_pois"
  Column  |         Type          | Collation | Nullable | Default
----------+-----------------------+-----------+----------+---------
 uuid     | bytea                 |           | not null |
 osm_id   | bigint                |           | not null |
 osm_type | integer               |           | not null |
 geom     | geography(Point,4326) |           | not null |
Indexes:
    "ops_planet_pois_pkey" PRIMARY KEY, btree (uuid)
    "idx_ops_planet_pois_geom" gist (geom)
    "ix_ops_planet_pois_osm_id" btree (osm_id)
Referenced by:
    TABLE "ops_planet_pois_categories" CONSTRAINT "ops_planet_pois_categories_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)
    TABLE "ops_planet_pois_tags" CONSTRAINT "ops_planet_pois_tags_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)

ops_planet_pois_categories

gis=# \d ops_planet_pois_categories
                             Table "public.ops_planet_pois_categories"
  Column  |  Type   | Collation | Nullable |                        Default
----------+---------+-----------+----------+--------------------------------------------------------
 id       | integer |           | not null | nextval('ops_planet_pois_categories_id_seq'::regclass)
 uuid     | bytea   |           | not null |
 category | integer |           | not null |
Indexes:
    "ops_planet_pois_categories_pkey" PRIMARY KEY, btree (id)
    "ix_ops_planet_pois_categories_category" btree (category)
    "ix_ops_planet_pois_categories_uuid" btree (uuid)
Foreign-key constraints:
    "ops_planet_pois_categories_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)

ops_planet_pois_tags

gis=# \d ops_planet_pois_tags
                            Table "public.ops_planet_pois_tags"
 Column |  Type   | Collation | Nullable |                     Default
--------+---------+-----------+----------+--------------------------------------------------
 id     | integer |           | not null | nextval('ops_planet_pois_tags_id_seq'::regclass)
 uuid   | bytea   |           | not null |
 osm_id | bigint  |           | not null |
 key    | text    |           |          |
 value  | text    |           |          |
Indexes:
    "ops_planet_pois_tags_pkey" PRIMARY KEY, btree (id)
    "ix_ops_planet_pois_tags_key" btree (key)
    "ix_ops_planet_pois_tags_uuid" btree (uuid)
    "ix_ops_planet_pois_tags_value" btree (value)
Foreign-key constraints:
    "ops_planet_pois_tags_uuid_fkey" FOREIGN KEY (uuid) REFERENCES ops_planet_pois(uuid)

Example:

gis=# SELECT * FROM ops_planet_pois LIMIT 5;
                uuid                |  osm_id   | osm_type |                        geom
------------------------------------+-----------+----------+----------------------------------------------------
 \x25f48d87398c4d2fa3e59af27ea307ff | 110247790 |        1 | 0101000020E6100000C8330D2F2DB84EC006AB8F2C05E92840
 \x56b20dd5d3e340d7ba663d12b0655848 | 108493272 |        1 | 0101000020E6100000317D4E8B0BB84EC07052C6095D0F2940
 \xb6e64021629a43528d96874b4301e879 |   9829772 |        1 | 0101000020E61000007433A31F8D9D51C05BE48D8296C83340
 \x92fe63810b3243c7a9a3c4d05c6f0c67 |  26241064 |        1 | 0101000020E6100000C4F59441DA9854C0ED707495EE203740
 \x50f7f61cb96b4644bf97b00723aa0cbc |  26242036 |        1 | 0101000020E610000051488D646A9954C070C328081E033740
(5 rows)

...

gis=# SELECT * FROM ops_planet_pois_tags ORDER BY uuid LIMIT 5;
    id    |                uuid                |   osm_id   |    key     |       value
----------+------------------------------------+------------+------------+-------------------
  9445969 | \x000000d29f844abf9fb85b2d6d75290e | 4766271454 | name       | Nasi Katok Adik
  2371439 | \x0000022702df4af3a763b5aa140ab7ae | 1188977786 | wheelchair | yes
  2371438 | \x0000022702df4af3a763b5aa140ab7ae | 1188977786 | name       | Bistro Augenblick
 12345299 | \x00000267e601418a98fc80bba38c5eeb | 1821133462 | wheelchair | yes
 12345298 | \x00000267e601418a98fc80bba38c5eeb | 1821133462 | name       | Simonet
(5 rows)

...

gis=# SELECT * FROM ops_planet_pois_categories ORDER BY uuid LIMIT 5;
    id    |                uuid                | category
----------+------------------------------------+----------
 30877690 | \x0000001817504b0c87c3de5826105582 |      335
 29785850 | \x0000008e91234c678cb229495fa55f6b |      565
 18003787 | \x000000d29f844abf9fb85b2d6d75290e |      566
 26977347 | \x000001c9f2ce4296842239e7e5cb67cf |      292
  6606057 | \x0000022702df4af3a763b5aa140ab7ae |      570
(5 rows)

The uuid is the FK for the other tables. Maybe there exists a more intuitive way but nothing came to our mind back then at least.

categories are extracted and stored for each OSM object, duplicating data millions of times - each OSM object has one category or more than one, there are no duplicates in there.

e.g.

gis=# SELECT count(*) FROM ops_planet_pois_categories;
  count
----------
 38365931
(1 row)

gis=# SELECT count(*) FROM ops_planet_pois;
  count
----------
 38117690
(1 row)