GIScience/openpoiservice

Investigate usage of alternative ways to process PBFs

nilsnolde opened this issue · 3 comments

Since imposm is deprecated since a long while and we really don't wanna start maintaining that, we need to investigate alternative ways to process PBFs:

  • pyosmium could be an OK candidate. Very limited API, but does accept callbacks for OSM types
  • https://pypi.org/project/esy-osm-pbf/ seems to be a relatively new package, doing what we'd need (couldn't find in on any VCS platform though..)
  • use osmium/osmosis or other command-line utilities or even Pelias' pbf2json utitlity. All at the expense of creating more non-Python dependencies..

So, these will have to be evaluated a little in terms of performance with clear favorites being the first two options, as only protobuf lib as non-Python dep needed.

If you are going to change this one day, just be aware with this that you don't fall into the same trap I did back then with the amount of memory used to parse larger pbf files and holding data in memory for later stages, e.g. https://github.com/GIScience/openpoiservice/blob/master/openpoiservice/server/db_import/parse_osm.py#L270

Yep, I'll have a look how others do that, e.g. Pelias OSM importer, should be a fairly similar problem for them.

Some update on this:

I'll use pyosmium. It's way more sophisticated than I thought. With that, we can use good strategies handle the memory stuff, the strategy could be derived from size of PBF and available RAM:
https://osmcode.org/osmium-concepts/#list-of-map-index-classes