ropensci/osmdata

[FEATURE] Add "PRO" features: custom queries

Mashin6 opened this issue ยท 5 comments

First of all, awesome job with parsing OSM data and making use of overpass user friendly!

Although most of the use cases are just a simple filter queries, I think it would be useful to have an option to write custom queries in overpass QL and pass it to functions like osmdata_sf. Right now I have to do it the hacky way by modifying $prefix and $features in overpass_query object.

Second, would be to add function osmdata_df() that would parse the OP output into regular data.frame object. That way queries like this, which return csv format would possible. This would allow afterwards doing left_join on data from osmdata_sf() query to have super simple way of e.g. plot count statistics per area object.

[out:csv(name,counts;false;",")][timeout:3600];
{{geocodeArea:Southeastern Connecticut COG}}->.a;

rel[admin_level=8](area.a);
map_to_area;

foreach -> .town(
  nwr[natural=tree](area.town)->.objects;
  make stat
    name = town.set(t["name"]),
    counts = objects.count(nwr);
  out;
  );

Thanks @Mashin6. For you first question, there is the the opq_string() function. As described in the main vignette, query strings can be passed directly to the various osmdata_... functions.

Your second question is interesting, but may be hard to generalise. I'll think about it and report back - feel free to add any thoughts in the meantime about how you think this might be generalised.

Oh cool. I didn't realize osmdata_sf() can take other things then just a query object.

To be honest this is quite advanced use of OP and there are many ways foreach; and for() can be used. The simplest implementation would be to have osmdata_df() function that parses output of [out:csv] type queries.
But if you think that "casual" users would benefit from this then I was thinking adding some more functions would generalize my above example.

getbb("Southeastern Connecticut COG", featuretype = "boundary") %>% 
    opq() %>%
    add_csv_fields(list("name" = "set(t[\"name\"])", "counts" = "count(nwr)"), header = FALSE) %>%
    add_osm_topfeature(key="admin_level", value="8") %>%
    add_osm_feature(key="natural", value="tree") %>%
    osmdata_df()

add_csv_fields would take care of the header and mapping of methods applied to data in make part.
add_osm_topfeature would define features for which the stats are generated.

If you would want to go further and generalize for multiple tags this gets more complicated.

[out:csv(name,riverbank,river)][timeout:250];
{{geocodeArea:Sweden}}->.a;

rel[admin_level=4](area.a);
map_to_area;

foreach -> .town(
  wr[waterway=riverbank](area.town)->.objects1;
  wr[water=river](area.town)->.objects2;

  make stat
    name = town.set(t["name:en"]),
    riverbank = objects1.count(wr),
    river = objects2.count(wr);
  out;
);

Perhaps the order in of tags in add_osm_features(features = c(..)) could define the tag<-> column mapping.

Off topic: Just out of curiosity, why did you decide to use add_osm_features(features = c(..)) for union of object types and add_osm_features() %>% add_osm_features() for adding additional tag filters? To me this seems counterintuitive.

Thanks @Mashin6 - I actually think we'll be able to get somewhere with this csv idea. It really opens up a heap of analytic possibilities and reduces burden on the overpass server at the same time. I'm thinking along the lines of an alternative to add_osm_feature that has a group_by argument along the lines of your add_osm_topfeature demo. I'll hopefully find time to start playing around within the next couple of weeks.

As for your Off topic Q: The honest reason is because the original implementation had only piping == AND operations and there was no way to OR-combine. That got addressed by #237, which kind-of explains why it was designed that way: the OR queries involve submitting all key-value pairs to overpass like shown there, and so they are all passed via a single c() operator. That said, this is nothing other than retrospective historical justification for potentially poor design decisions. Happy to discuss the design of that further if you've got any particular suggestions, for which please feel free to continue in a new issue.

Awesome! Thanks for looking into that.


I see. The reason I find it confusing is that add_osm_feature implies including more features into the OP result. While currently every subsequent use adds additional filter, which results in actually retrieving less features. I personally would find it more intuitive if piping would be OR operator and AND operator would be via e.g. add_osm_feature(filters = c(...))

This would also allow to run more complex queries like to retrieve broad leaved trees, benches and parks with "public" in their name:

opq() %>%
add_osm_feature(filters = c("natural = tree", "leaf_type = broadleaved") %>%
add_osm_feature(filters = c("leisure = park", "name ~ public") %>%
add_osm_feature(key = "amenity", value = "bench")

this would be translate to OP query:

(nwr[natural=tree][leaf_type=broadleaved];
nwr[leisure=park][name~public];
nwr[amenity=bench];);

I very much agree that the current design is counter-intuitive, but flipping it like that would be quite the breaking change! Maybe something for a 1.0 version?
But I realise I am only adding to the off-topic question here, sorry.