Reading PBF from osmfilter -> osmconvert fails
Opened this issue · 1 comments
I'm working with very large PBFs (whole Planet.osm dumps) and want fetch all objects with certain tags, such as all airports (aeroway=aerodrome
), ports (harbour=yes
), etc.
I've seen that it's much faster to use a combination of popular OSM tools for preprocessing than to feed the whole file into pyrosm
. For instance, extracting all airports from the latest Geofabrik extract for Belgium takes 4 seconds that way versus 4 minutes.
Directly using pyrosm
%%time
from pyrosm import OSM
osm = OSM('belgium-latest.osm.pbf')
osm.get_pois({'aeroway': ['aerodrome']})
# CPU times: user 1min 34s, sys: 1min 12s, total: 2min 46s
# Wall time: 3min 58s
Preprocessing with osmconvert
and osmfilter
. extract_airports
is simply a wrapper function for filtering an .o5m
file with osmfilter
and then converting back to PBF with osmconvert
:
%%time
extract_airports(o5m_file, 'belgium_airports.gpkg')
# Filtering tags...
# CompletedProcess(args='osmfilter belgium-latest.o5m --keep="aeroway=aerodrome" --drop-version -o=belgium-latest_filtered.o5m', returncode=0, stdout=b'', stderr=b'')
# Converting back to PBF...
# CompletedProcess(args=['osmconvert', 'belgium-latest_filtered.o5m', '-o=belgium-latest_filtered.osm.pbf'], returncode=0, stdout=b'', stderr=b'')
# Reading into GeoDataFrame with pyrosm...
# CPU times: user 37.5 ms, sys: 24.2 ms, total: 61.7 ms
# Wall time: 4.47 s
However, when filtering less common tags like harbour=yes
, reading the preprocessed PBF fails. Here belgium-latest_filtered.osm.pbf
is the result of the above filtering and conversion:
%%time
from pyrosm import OSM
osm = OSM('belgium-latest_filtered.osm.pbf')
osm.get_pois({'harbour': ['yes']})
# Returns a KeyError for missing `tags`
Full Traceback
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Input In [8], in () ----> 1 osm.get_data_by_custom_criteria({'harbour': True})File /opt/homebrew/lib/python3.9/site-packages/pyrosm/pyrosm.py:689, in OSM.get_data_by_custom_criteria(self, custom_filter, osm_keys_to_keep, filter_type, tags_as_columns, keep_nodes, keep_ways, keep_relations, extra_attributes)
686 if isinstance(self._nodes, list):
687 self._nodes = concatenate_dicts_of_arrays(self._nodes)
--> 689 gdf = get_user_defined_data(
690 self._nodes,
691 self._node_coordinates,
692 self._way_records,
693 self._relations,
694 tags_as_columns,
695 custom_filter,
696 osm_keys_to_keep,
697 filter_type,
698 keep_nodes,
699 keep_ways,
700 keep_relations,
701 self.bounding_box,
702 )
704 # Do not keep node information unless specifically asked for
705 # (they are in a list, and can cause issues when saving the files)
706 if not self.keep_node_info and gdf is not None:
File /opt/homebrew/lib/python3.9/site-packages/pyrosm/user_defined.py:37, in get_user_defined_data(nodes, node_coordinates, way_records, relations, tags_as_columns, custom_filter, osm_keys, filter_type, keep_nodes, keep_ways, keep_relations, bounding_box)
34 relations = None
36 # Call signature for fetching POIs
---> 37 nodes, ways, relation_ways, relations = get_osm_data(
38 node_arrays=nodes,
39 way_records=way_records,
40 relations=relations,
41 tags_as_columns=tags_as_columns,
42 data_filter=custom_filter,
43 filter_type=filter_type,
44 osm_keys=osm_keys,
45 )
47 # If there weren't any data, return empty GeoDataFrame
48 if nodes is None and ways is None and relations is None:
File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:177, in pyrosm.data_manager.get_osm_data()
File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:178, in pyrosm.data_manager.get_osm_data()
File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:171, in pyrosm.data_manager._get_osm_data()
File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_manager.pyx:151, in pyrosm.data_manager.get_osm_nodes()
File /opt/homebrew/lib/python3.9/site-packages/pyrosm/data_filter.pyx:282, in pyrosm.data_filter.filter_node_indices()
KeyError: 'tags'
It seems the error comes from the get_osm_nodes
CPython function because there is no tags
key.
However, I can't seem to find anything wrong with the PBF file generated from osmconvert
, and the contents seem alright:
osmconvert belgium-latest_filtered.o5m --csv="@id @lon @lat amenity shop name" --csv-headline -o=belgium_harbours.csv
head belgium_harbours.csv ✔
@id @lon @lat amenity shop name
22433531 4.3943736 51.2296887
22433539 4.3944307 51.2298171
60261479 4.4085424 51.2287714 Jachthaven Willemdok
96946445 2.9311625 51.2241274
96946447 2.9330668 51.2229879
96946463 2.9384831 51.2191516
96946465 2.9386072 51.2193169
96946467 2.9387759 51.2195599
96946470 2.9388064 51.2196844
Any ideas on what might be going wrong? Attached is a problematic PBF, filtered to include only objects with the harbour=yes
tag.
I have a similar issue, after converting (or actually down sizing pbf) I am unable to use them within pyrosm.
conversion:
c:\>osmconvert64-0.8.8p.exe "us-midwest-latest.osm.pbf" -b=41,-85,42,-84 --complete-ways --out-pbf -o=41_85_42_84.osm.pbf
and then try and use it within pyrosm
osm = OSM("41_85_42_84.osm.pbf")
drive_net = osm.get_network(network_type="driving+service")
drive_net.plot(figsize=(20,20))
I get the following error, obviously I need to dive into the converted file more...
ValueError Traceback (most recent call last)
Input In [27], in <cell line: 2>()
----> 1 drive_net = osm.get_network(network_type="driving+service")
2 drive_net.plot(figsize=(20,20))
File ~\pyrosm\pyrosm.py:202, in OSM.get_network(self, network_type, extra_attributes, nodes)
199 tags_as_columns += extra_attributes
201 if self._nodes is None or self._way_records is None:
--> 202 self._read_pbf()
204 # Filter network data with given filter
205 edges, node_gdf = get_network_data(
206 self._node_coordinates,
207 self._way_records,
(...)
211 slice_to_segments=nodes,
212 )
File ~\pyrosm\pyrosm.py:121, in OSM._read_pbf(self)
118 self._all_way_tags = way_tags
120 # Prepare node coordinates lookup table
--> 121 self._node_coordinates = create_node_coordinates_lookup(self._nodes)
File ~\pyrosm\geometry.pyx:285, in pyrosm.geometry.create_node_coordinates_lookup()
File ~\pyrosm\geometry.pyx:286, in pyrosm.geometry.create_node_coordinates_lookup()
File ~\pyrosm\geometry.pyx:64, in pyrosm.geometry._create_node_coordinates_lookup()
File <__array_function__ internals>:180, in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
Also tried it with nodes=True
pyrosm\pyrosm.py:205: UserWarning: Could not find any edges for given area.
edges, node_gdf = get_network_data(
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [28], in <cell line: 4>()
1 drive_net = osm.get_network(network_type="driving+service", nodes=True)
----> 2 drive_net.plot(figsize=(20,20))
AttributeError: 'tuple' object has no attribute 'plot'