This is an example of a simple application that uses the VectorPipe library to convert an ORC file containing OpenStreetMap data into VectorTiles. It can be ran both locally and on Amazon’s EMR service, provided you have the right credentials.
As of 2017 September 7:
- A locally published SNAPSHOT of GeoTrellis 1.2.0
- A locally published SNAPSHOT of VectorPipe 1.0
The easiest way to run this demo is through SBT via sbt run
. First, you will need
to unmark the spark-hive
dependency as being provided
. After your change, you should see
something like:
libraryDependencies ++= Seq(
...
"org.apache.spark" %% "spark-hive" % "2.2.0",
...
)
sbt run
will pass any extra options it’s given directly to the main
method.
This demo uses the Decline library to handle CLI options, and expects the following:
Usage: vp-orc-io --orc <string> --bucket <string> --key <string> --layer <string> [--local] Convert an OSM ORC file into VectorTiles Options and flags: --help Display this help text. --orc <string> Location of the .orc file to process --bucket <string> S3 bucket to write VTs to --key <string> S3 directory (in bucket) to write to --layer <string> Name of the output Layer --local Is this to be run locally, not on EMR?
For example:
run --orc s3://vectortiles/orc/europe/andorra.orc --bucket vectortiles --key orc-catalog --layer andorra --local
The --local
flag is only necessary when running the demo locally and interacting with S3.
It’s possible to avoid interaction with S3 completely. Within IO.scala
you’ll find a section:
/* For writing a compressed Tile Layer */
val writer = S3LayerWriter(S3AttributeStore(bucket, prefix))
// val writer = FileLayerWriter(FileAttributeStore("/path/to/catalog/"))
Now you can switch to the FileLayerWriter
, alter the path you’d like to write to, and run the demo as before.
Example:
run --orc /home/colin/code/azavea/vp-io-test/georgia.orc --bucket vectortiles --key orc-catalog --layer georgia --local
You still need to specify --bucket
and --key
, but those values won’t be used anywhere.
This assumes you have AWS credentials and have the awscli
set of programs installed.
As of 2017 Sept 7, you must also have installed a custom version
of Terraform and its AWS resource provider via these instructions. This will no longer be necessary
once version 1.0 of the provider is officially released.
Within the deploy/
folder, doing the following will create your cluster:
terraform apply
After 5 minutes or so the process will complete and print out the cluster’s ID.
You will need this for later. If you lose track of it, terraform show
will
print it again for you.
First:
sbt assembly
This will create the “uber jar” target/scala-2.11/vp-io-test-assembly-1.0.0.jar
.
Upload this to S3 in order to make it visible to your EMR cluster.
Edit steps.json
to match where you uploaded your assembly and what ORC file
you intend to ingest:
[
{
"Name": "VectorPipe ORC Demo",
"Type": "CUSTOM_JAR",
"Jar": "command-runner.jar",
"Args": [
...
"s3://vectortiles/jars/vp-io-test-assembly-1.0.0.jar",
"--orc","s3://vectortiles/orc/europe/finland.orc",
"--bucket","vectortiles",
"--key","orc-catalog",
"--layer","finland"
]
}
]
To submit this to EMR:
aws emr add-steps --cluster-id <YOUR-CLUSTER-ID> --steps file://./steps.json --region us-east-1
The job can then be monitored as usual through the EMR UI or FoxyProxy.