The flexible polyline encoding is a lossy compressed representation of a list of coordinate pairs or coordinate triples.
It achieves that by:
- Reducing the decimal digits of each value.
- Encoding only the offset from the previous point.
- Using variable length for each coordinate delta.
- Using 64 URL-safe characters to display the result.
The encoding is a variant of Encoded Polyline Algorithm Format. The advantage of this encoding over the original are the following:
- Output string is composed by only URL-safe characters, i.e. may be used without URL encoding as query parameters.
- Floating point precision is configurable: This allows to represent coordinates with precision up to microns (5 decimal places allow meter precision only).
- It allows to encode a 3rd dimension with a given precision, which may be a level, altitude, elevation or some other custom value.
An encoded flexible polyline is composed by two main parts: A header and the actual polyline data. The header always starts with a version number that refers to the specifications in use. A change in the version may affect the logic to encode and decode the rest of the header and data. v.1 is the only version currently defined and this is the version assumed in the rest of the document.
[header version][header content][data]
Both header and data make use of variable length integer encoding.
Every input integer is converted in one or more chunks of 6 bits where the highest bit is a control bit while the remaining five store actual data. Each of these chunks gets encoded separately as a printable character using the following character set:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
Where A
represents 0 and _
represents 63.
The variable encoding uses the highest bit (the sixth in this case) as control bit: When it's set to
1
it means another chunk needs to be read. Here is the algorithm to encode an unsigned integer:
- Given the binary representation of the number, split it in chunks of 5 bits.
- Convert every chunk from right to left.
- Every chunk that is followed by another chunk will be in OR with
0x20
(set the sixth bit to1
) and then encoded as a character. - The last chunk (leftmost) is encoded as a character directly since the sixth bit is already set
to
0
.
In the binary representation of signed integers, the least significant bit stores the sign and the
subsequent bits encode the number value resp. abs(n)-1
for n < 0
. That is, a positive integer
p
is stored as 2*p
while a negative integer n
is stored as 2*abs(n)-1
. So for example, the
number 7 is represented as 1110
and -7 as 1101
. After this transformation, the normal unsigned
integer encoding algorithm can be used.
The first unsigned varint of the encoded string refers to the specification version.
It's encoded as an unsigned varint. The bits of the header content have the following structure:
bit [10 7] [6 4] [3 0]
value [3rd dim precision] [3rd dim flag] [precision]
Refers to the precision (decimal digits after the comma) of the latitude and longitude coordinates. It is encoded as an unsigned integer with a range 0 to 15.
This flag specifies whether the third dimension is present and what meaning it has. It's encoded as an unsigned integer with a range 0 to 7.
Possible values are:
- 0 – absent
- 1 – level
- 2 – altitude
- 3 – elevation
- 4 – reserved1
- 5 – reserved2
- 6 – custom1
- 7 – custom2
Refers to the precision (decimal digits after the comma) of the third dimension. Possible values are 0 to 15.
All data values need to be normalized before encoding by transforming them to integers with the given precision. For example if precision is 5, the value 12.3 becomes 1230000.
The data section is composed by a sequence of signed varints grouped in tuples of the same size (2 if the 3rd dimension flag is absent and 3 otherwise).
The first tuple contains the first set of normalized coordinates, while any other subsequent tuple contains the offset between two consecutive values on the same dimension.
Lat0 Lng0 3rd0 (Lat1-Lat0) (Lng1-Lng0) (3rdDim1-3rdDim0) ...
The following coordinates
(50.10228, 8.69821), (50.10201, 8.69567), (50.10063, 8.69150), (50.09878, 8.68752)
are encoded with precision 5 as follows:
B F oz5xJ 67i1B 1B 7P zI ha xL 7Y
- The initial
B
is the header version (v.1) - The second letter
F
is the header, corresponding to the precision 5 and no 3rd dimension set. - The rest of the string are the encoded coordinates.
Feel free to contribute an implementation. You can either use C-bindings, or provide an implementation in your language. Take a look at the implementations in other languages already available. Usually, the encoding/decoding is very straight-forward. The interface should match roughly the following API:
encode(coordinates, precision, third_dimension, third_dimension_precision) -> string;
decode(string) -> coordinates;
get_third_dimension(string) -> third_dimension;
To test your implementation, use the polylines defined in test/original.txt. Depending on the round function available in the language the expected encoded and decoded files need to be used:
- Round to nearest, ties away from zero:
- Round to nearest, ties to even:
Check that encoded the original data results in the encoded form, and that decoding it again results in the decoded form. Format of the unencoded data is:
- 2d:
{(precision2d); [(lat, lon), ..., (lat, lon), ]}
- 3d:
{(precision2d, precision3d, type3d); [(lat, lon, z), ..., (lat, lon, z), ]}
Floating point numbers are printed with 15 digits decimal precision. Be aware that encoding is lossy: Decoding an encoded polyline will not always yield the original, and neither will encoding a decoded polyline result in the same encoded representation.
- Extend provided tests. We should cover all combinations of parameters.
- Add C-bindings for people who just want to wrap them.
Copyright (C) 2019 HERE Europe B.V.
See the LICENSE file in the root of this project for license details.