Add cstrip() and cwrap functions

Question

Add cstrip() and cwrap functions

Closed this issue 9 months ago · 11 comments

It would be great if CoordRefSystems could provide functions cstrip() and cwrap() for converting between coordinate types (e.g. x::LatLon) and coordinate vectors (e.g. SVector(x.lat, xlon)). Such functions are very useful for interfacing with non-CoordRefSystems-aware software (e.g. plotting libraries) and implementing coordinate arithmetic. For example:

function rhumb_line(x::CRS, y::CRS)
    C = Mercator
    return t -> cwrap(C, t * cstrip(C, x) + (1-t) * cstrip(C, y))
end

function mean(coords::AbstractVector{<:CRS})
    C = Cartesian{WGS84latest}
    m = cwrap(C, mean(cstrip(C, x) for x in coords))
    # Don't trust means that are too far underground
    # (i.e. the `coords` are not local enough)
    @assert abs(convert(LatLonAlt, m).alt) < 10
    return m
end

translated_latlon(x::CRS, d::SVector{3}) = cstrip(LatLon, cwrap(ENU(x), d))

Answer 1 · 2024-03-26T09:54:00.000Z

I believe that our current design with Base.getproperty is enough for the use cases you have in mind:

julia> using CoordRefSystems

julia> ll = LatLon(0, 0)
GeodeticLatLon{WGS84Latest} coordinates
├─ lat: 0.0°
└─ lon: 0.0°

julia> ll.lat
0.0°

julia> ll.lon
0.0°

julia> m = Mercator(0, 0)
Mercator{WGS84Latest} coordinates
├─ x: 0.0 m
└─ y: 0.0 m

julia> m.x
0.0 m

julia> m.y
0.0 m

We could also think about implementing Base.getindex for all CRS if an algorithm doesn't need to know the name of the coordinates. In general, most algorithms end up using the names x and y for Projected and lat and lon for Geographic. It is good to get an error if someone tries to pass the incorrect CRS.

What do you think? Would the present design already solve your original issue?

Answer 2 · 2024-03-26T11:54:21.000Z

Would the present design already solve your original issue?

Not really. Let's take the above rhumb_line() as an example. With cstrip() and cwrap(), this is effectively a one-liner.

function rhumb_line(x::CRS, y::CRS)
    C = Mercator
    return t -> cwrap(C, t * cstrip(C, x) + (1-t) * cstrip(C, y))
end

Using the properties, this gets much more cumbersome.

function rhumb_line(x::CRS, y::CRS)
    x_merc = convert(Mercator, x)
    y_merc = convert(Mercator, y)
    return t -> Mercator(
        t * x_merc.x + (1-t) * y_merc.x, 
        t * x_merc.y + (1-t) * y_merc.y, 
    )
end

Still doable, of course, but in our work computations like the above arise frequently and then the properties-based approach just doesn't scale. So I would argue there is a real gap in the API here, but I'm open to exploring alternative solutions.

Answer 3 · 2024-03-26T12:07:29.000Z

I would argue that the second code snippet is much more readable and explicit. It also gives the compiler more chance to avoid allocations. It should be as fast or faster than handling opaque tuples, specially if you preprocess (1-t) once and use the result in both expressions explicitly.

Also notice that you don't need to convert to Mercator if you dispatch on Mercator or any Projected type:

rhumb_line(a::Projected, b::Projected) = t -> Mercator(t * a.x + (1-t) * b.x, t * a.y + (1-t) * b.y)

Effectively a one-liner, with optimal performance.

Answer 4 · 2024-03-26T12:18:25.000Z

If we add a Base.getindex for all CRS, the code doesn't require cstrip and cwrap, and will also be performant:

rhumb_line(a::Projected, b::Projected) = t -> Mercator(t .* a .+ (1-t) .* b)

But that is a decision that we need to consider carefully, as we don't want to promote unsafe treatment of coords as if they were mere tuples. This is what happens everywhere else in other languages, and is a source of obscure bugs.

Answer 5 · 2024-03-26T13:02:08.000Z

But that is a decision that we need to consider carefully, as we don't want to promote unsafe treatment of coords as if they were mere tuples. This is what happens everywhere else in other languages, and is a source of obscure bugs.

I fully agree, and in fact your implementation of rhumb_line() is a typical example for why implementing arithmetic directly on the coordinate types is dangerous: for t*a + (1-t)*b to be a rhumb line you have to switch to Mercator coordinates before doing the interpolation, so if I call say rhumb_line(LatLon(...), WebMercator(...)) then your code silently does something completely different from what I intended. By contrast, if arithmetic is implemented in terms of cstrip() / cwrap() then you are forced to explicitly specify in which coordinates the arithmetic is to be carried out, and therefore any mistake in choosing a coordinate space would at least be easily spottable.

Answer 6 · 2024-03-26T13:20:17.000Z

so if I call say rhumb_line(LatLon(...), WebMercator(...))

That is not possible with the code I shared above because there is no method dispatching Geographic with Projected coordinates. You are misunderstanding the concepts here.

If you really need to convert the inputs to Mercator, then that is a separate issue. After you have your two inputs in the expected Projected subtype, you can easily access x and y as I did above. It seems that you want to add semantics to cstrip and cwrap to make sure that the "x" coordinate is always returned whatever that means, even in the LatLon case. That is not a good design. There is no such thing as the "x" coordinate of the LatLon coordinates.

Answer 7 · 2024-03-27T01:32:54.000Z

so if I call say rhumb_line(LatLon(...), WebMercator(...))

That is not possible with the code I shared above because there is no method dispatching Geographic with Projected coordinates.

I don't see how distinguishing between Geographic and Projected matters here¹. rhumb_line() should work for any CRS that you pass into it, since any CRS defines a unique point on earth and so you can define the rhumb line between the two. (There are probably some subtleties regarding CRS with altitude information and/or different datums, but these subtleties are mostly orthogonal to the main point here, so let's not get into that.) My argument works out exactly the same when applied to say rhumb_line(Robinson(...), Orthographic(...)), and in fact it works out even when applied to rhumb_line(Robinson(...), Robinson(...)) since the result is a rhumb line if and only if the interpolation is done in Mercator coordinates.

It seems that you want to add semantics to cstrip and cwrap to make sure that the "x" coordinate is always returned whatever that means, even in the LatLon case. [...] There is no such thing as the "x" coordinate of the LatLon coordinates.

I suspect I may not have expressed myself clearly (and I'm terribly sorry if that's the case). I want cstrip() to be this (or some minor variation of this):

function cstrip(C::Type{<:Projected}, a::CRS)
    b = convert(C, a)
    return SVector(b.x, b.y)
end

Yes, this removes the CRS meta-information associated with the (x,y) tuple, but so does a.x and also a[1], so in that respect there is no difference between the two approaches. However, there are important differences between cstrip() and getproperty() in other respects:

cstrip(C, a) forces you to be explicit about the coordinate type in which you unpack a, while a.x doesn't. So you could write a.x in a context where a is known to be a Mercator coordinate, then later on you change the context to allow arbitrary Projected and suddenly your code breaks. This type of mistake is not possible with cstrip().

cstrip(C, a) gives you the coordinates in a form that's much more convenient for many purposes. Imagine you have an (unstructured) set of coordinates a, b, c, and you want to plot them all on a WebMercator map with some custom styling applied to each. cstrip() allows you to do this:

scatter!(axis, cstrip(WebMercator, a); label = "a")
scatter!(axis, cstrip(WebMercator, b); color = :green)
scatter!(axis, cstrip(WebMercator, c); marker = "x")

By contrast, with getproperty() you have to do this:

awm = convert(WebMercator, a)
scatter!(axis, (awm.x, awm.y); label = "a")
bwm = convert(WebMercator, b)
scatter!(axis, (bwm.x, bwm.y); color = :green)
cwm = convert(WebMercator, c)
scatter!(axis, (cwm.x, cwm.y); marker = "x")

In fact, I'm not even sure what is the difference between Geographic and Projected coordinate systems. They are both just tuples of numbers describing points on earth, so as far as I can tell they are just "different by convention"? However, unless you think I'm seriously missing an important point here and would be so kind as to lift my ignorance, I suggest we postpone this discussion for another day. ↩

Answer 8 · 2024-03-27T01:44:14.000Z

It also gives the compiler more chance to avoid allocations. It should be as fast or faster than handling opaque tuples, specially if you preprocess (1-t) once and use the result in both expressions explicitly.

AFAICT there should be no difference in performance between the two approaches. The tuples in question here are all statically sized and have fully inferable types, so the compiler should be able to optimise the structured code in exactly the same way as the manually unrolled code. Of course, as usual when it comes to performance optimisations the devil is in the detail, but this detail can swing both ways and therefore I don't think performance favours either approach.

Answer 9 · 2024-03-27T09:31:30.000Z

cstrip(C, a) forces you to be explicit about the coordinate type in which you unpack a, while a.x doesn't. So you could write a.x in a context where a is known to be a Mercator coordinate, then later on you change the context to allow arbitrary Projected and suddenly your code breaks. This type of mistake is not possible with cstrip().

This argument is not very strong since the whole idea of the cstrip helper is toconvert and then use the fields as a raw tuple. The more explicit approach is

xy = let
  m = convert(Mercator, c)
  (m.x, m.y)
end

We retain our viewpoint that being explicit is better than adding hidden layers of conversion. Utility functions such as cstrip and cwrap can live in other downstream packages if necessary. There is nothing inhibiting the definition of these functions in user code.

cstrip(C, a) gives you the coordinates in a form that's much more convenient for many purposes. Imagine you have an (unstructured) set of coordinates a, b, c, and you want to plot them all on a WebMercator map with some custom styling applied to each. cstrip() allows you to do this:
scatter!(axis, cstrip(WebMercator, a); label = "a")
scatter!(axis, cstrip(WebMercator, b); color = :green)
scatter!(axis, cstrip(WebMercator, c); marker = "x")

This layer of abstraction is too low. What we have in mind is a Makie recipe that consumes Meshes.Point with given coordinates, and plots them with the nice options it already provides. End-users should never access these low-level details of coordinates, querying fields, converting to different CRS point-by-point. The big picture will be clearer in the following months after we refactor Meshes.jl accordingly. You will then be able to assess and reproduce the approach elsewhere in a high-level interface if you wish.

In fact, I'm not even sure what is the difference between Geographic and Projected coordinate systems. They are both just tuples of numbers describing points on earth, so as far as I can tell they are just "different by convention"? However, unless you think I'm seriously missing an important point here and would be so kind as to lift my ignorance, I suggest we postpone this discussion for another day. ↩

For one thing, Geographic coordinates wrap around the Earth, they are known to cover the entire globe, and are periodic. Projected coordinates on the other hand have various limitations as you know. Some of them have a limited domain (see indomain), and all of them are a flat approximation of the globe that does not wrap around. These two abstract types allow advanced users to dispatch different algorithms. If an algorithm doesn't use these properties, it can simply dispatch on the general CRS supertype.

Answer 10 · 2024-03-27T09:51:27.000Z

Utility functions such as cstrip and cwrap can live in other downstream packages if necessary. There is nothing inhibiting the definition of these functions in user code.

That's fair. I initially didn't realise that all Projected coordinates exhibit their coordinates as x and y fields, and without this convention it would have been quite tedious to maintain my own cstrip() function. But with this convention, that's perfectly doable.

Answer 11 · 2024-03-28T13:25:38.000Z

Nice. I think we converged on this one. We can come back to it later after we have a better idea of how these CRS will be used in practice.

Footnotes