r-spatial/discuss

MongoDB to sf

Closed this issue · 10 comments

I'm planning on submitting a proposal to RConsortium to create a MongoDB-to-sf library.

What's r-spatial's appetite for this, is it already being done, and is it a good idea?


Example

MongoDB can store geospatial data. I have a collection (saved locally) which contains over 72,000 LINESTRINGS (roads in Victoria, Australia)

Getting this data into R as an sf object takes over 30 seconds (on my machine)

library(mongolite)
library(sf)

m <- mongo(db = "roads", collection = "roads")

system.time({
  m_roads <- m$find()
})
##   user  system elapsed 
## 29.288   0.845  32.112 

system.time({
  geom <- m_roads$geometry
  geom$id <- 1:nrow(geom)
  
  sfc <- lapply(geom$coordinates, sf::st_linestring)
  sfc <- sf::st_sfc(sfc)
})

##   user  system elapsed 
##  4.553   0.036   5.027 
  
str(sfc)
## sfc_LINESTRING of length 72943; first list element:  XY [1:2, 1:2] 145 145 -38 -38

I have created a prototype package which returns the same sfc object in approx 1 second

library(mongoGeo)

system.time({
  con <- mg_connect(db = "roads", collection = "roads")
  sfc <- mg_find_sfc(con)
})

##  user  system elapsed 
## 1.215   0.119   1.546

str(sfc)
## sfc_LINESTRING of length 72943; first list element:  XY [1:2, 1:2] 145 145 -38 -38

edzer commented

In principle yes, but it will be easier to give comments if you share your draft proposal; you'll find draft proposals of (successful) proposals for sf and stars in their respective repositories.

I'll share it in a couple of days when it's written; I've only just had the idea :)

It seems a little too specific (MongoDB only?) for RConsortium. Would it be extensible to other dbs as well?

The logic that does the conversion would be extensible because it's all about parsing GeoJSON.

However, communication with NoSQL databases are reliant on the specific drivers for those databases and the underlying data representation. e.g., MongoDB stores its data as BSON, and so provides the C/C++ API to communicate with it.

DynamoDB would have a different set of drivers, and FireGeo (or whatever they're going to name it) will have a different set again.

edzer commented

Doesn't MongoDB provide a WKB interface?

I've not heard of such an interface.

I had also forgotten about the GeoMongo package. Maybe it would be worth hooking into that in some way instead?

@mlampros - Do you have any thoughts on converting the output from GeoMongo directly to sf objects?

@SymbolixAU would you mind sharing a minimal reproducible example so that I'm able to compare my approach (GeoMongo) with yours (mongoGeo).

I've decided not to submit a proposal for this particular project, but, I think it's worth pursuing.

I'll share the mongo work soon, just fixing a few things first.

I took tim's "too specific" comment on board and thought about ways to make this more generic.

Ultimately it comes down to parsing GeoJSON, so I wanted to see if I could write a "fast" parser to add to what can already been done in sf e.g. here and in geojsonio.

So i've been playing with geojsonsf. The Readme gives some examples and benchmarks.

I'm now going to see if I can make this even more generic, or at least handle the BSON objects returned by MongoDB.

Closing, feel free to re-open if necessary