Investigate usage of alternative ways to process PBFs
nilsnolde opened this issue · 3 comments
Since imposm
is deprecated since a long while and we really don't wanna start maintaining that, we need to investigate alternative ways to process PBFs:
pyosmium
could be an OK candidate. Very limited API, but does accept callbacks for OSM typeshttps://pypi.org/project/esy-osm-pbf/
seems to be a relatively new package, doing what we'd need (couldn't find in on any VCS platform though..)- use
osmium
/osmosis
or other command-line utilities or even Pelias' pbf2json utitlity. All at the expense of creating more non-Python dependencies..
So, these will have to be evaluated a little in terms of performance with clear favorites being the first two options, as only protobuf
lib as non-Python dep needed.
If you are going to change this one day, just be aware with this that you don't fall into the same trap I did back then with the amount of memory used to parse larger pbf files and holding data in memory for later stages, e.g. https://github.com/GIScience/openpoiservice/blob/master/openpoiservice/server/db_import/parse_osm.py#L270
Yep, I'll have a look how others do that, e.g. Pelias OSM importer, should be a fairly similar problem for them.
Some update on this:
I'll use pyosmium. It's way more sophisticated than I thought. With that, we can use good strategies handle the memory stuff, the strategy could be derived from size of PBF and available RAM:
https://osmcode.org/osmium-concepts/#list-of-map-index-classes