Images are loaded into HBase for flexibility of split management.
Tiles are written by the TileToHbase command
java -cp target/original-osmosis-hbase-0.44.2.jar
org.openstreetmap.osmosis.hbase.xyz.cmd.TileToHBase
-f ~/Classifv11/DisturbYear/Classif33yDisturbYear_CANorth_UL7E10N_LR42E0N-0000000000-0000000000.tif
-o 'ca_north'
java -cp target/original-osmosis-hbase-0.44.2.jar:target/dependency/* org.openstreetmap.osmosis.hbase.xyz.cmd.TileToHBase -f ~/Classifv11/DisturbYear/Classif33yDisturbYear_CANorth_UL7E10N_LR42E0N-0000000000-0000000000.tif -o 'ca_north'
Running:
hadoop jar target/osmosis-hbase-0.44.2.jar
org.openstreetmap.osmosis.hbase.mr.analysis.ImageRegions
ca_north /user/tempehu/classV11/stats7/ 2020
Instead of preprocessing to seqence file, create the entities directly on HDFS? Keeping all ways by default
Three primary data tables, nodes ways and relations are kept. Each has the same row and rowkey design.
The rowkey is the primary key of the entity, with the byte order reversed. With monotonically increasing keys, this creates an almost perfect distribution over the range 00 to FF for the first byte in the key (see KeyDistributionTest for an illustration). When querying for known entity ids, e.g. when building a way, a node can be retrieved by simply reversing the known key.
*** A fourth table containing parsed entities is kept with a different strategy. *** Populated using MR *** the problem with invalid ways is avoided. They're not kept. *** Nodes not included. *** Polymorphic geometry types *** It is secondary data after all, many ways are shared
Column family "d":
- contains basic metadata, or common entity data (see org.openstreetmap.osmosis.core.domain.v0_6.CommonEntityData)
Column family "t":
- Contains all tags in their own namespace.
- Each tag key is a column, therefore standard query techniques e.g. Hive can be used with no fuss.
Tables should be created with the UniformSplit algorithm hbase org.apache.hadoop.hbase.util.RegionSplitter nodes UniformSplit -c 30 -f d:t
Hive mapping CREATE TABLE nodes(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,t:name") TBLPROPERTIES ("hbase.table.name" = "nodes");
CREATE external TABLE ways(key int, value string, geom binary) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,d:name,d:geom") TBLPROPERTIES ("hbase.table.name" = "ways");
hbase org.apache.hadoop.hbase.util.RegionSplitter nodes UniformSplit -c 30 -f d:t hadoop jar target/osmosis-hbase-0.1.jar org.openstreetmap.osmosis.hbase.mr.TableLoader ways /user/tempehu/africa-latest.pbf.seq /user/tempehu/hfile-relations hdfs dfs -chmod -R 777 /user/tempehu/hfile-relations hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/tempehu/hfile-relations relations
Currently the Osmosis domain has been used directly to facilitate integration with the rest of the toolchain.
Entities retrieved from HBase are eagerly evaluated. For example all tags are converted from binary map to a set of Tags (string pairs). This is wasteful and the HBase representation is perhaps more useful - there's no reason the backing map in a result can't be kept as-is and tags evaluated lazily. It may make sense to develop a higher performance version with a lazy wrapper.
*** Irritatingly we've had to use wrappers for each domain object. *** These can't be constructed in a simple way and because Entity is a class we can't extend both this and the functionality in the domain objects.
The situation is a little complex, however:
Way polygons are relatively simple - if the start node is the same as the end node we have a simple polygon. Polygon detection: https://github.com/tyrasd/osm-polygon-features/blob/master/polygon-features.json
Multipolygons are messier: http://wiki.openstreetmap.org/wiki/Relation:multipolygon
http://wiki.openstreetmap.org/wiki/Area/The_Future_of_Areas http://wiki.openstreetmap.org/wiki/Relation:multipolygon http://wiki.openstreetmap.org/wiki/Multipolygon_Examples https://help.openstreetmap.org/questions/8273/how-do-i-extract-the-polygon-of-an-administrative-boundary https://wiki.openstreetmap.org/wiki/Overpass_turbo/Polygon_Features https://wiki.openstreetmap.org/wiki/Overpass_turbo/Polygon_Features
Given the need to add a qualifier to an HBase record (see table joins), it no longer really makes sense to separate tables. It may be better to have a single entity table -- simplifies a lot of stuff - multi scans not necessary etc.
Single table branch Tests for qualifiers in entity records Protobuf for basic data records (user etc?)