MongoDB to sf
Closed this issue · 10 comments
I'm planning on submitting a proposal to RConsortium to create a MongoDB-to-sf library.
What's r-spatial
's appetite for this, is it already being done, and is it a good idea?
Example
MongoDB can store geospatial data. I have a collection (saved locally) which contains over 72,000 LINESTRINGS (roads in Victoria, Australia)
Getting this data into R as an sf
object takes over 30 seconds (on my machine)
library(mongolite)
library(sf)
m <- mongo(db = "roads", collection = "roads")
system.time({
m_roads <- m$find()
})
## user system elapsed
## 29.288 0.845 32.112
system.time({
geom <- m_roads$geometry
geom$id <- 1:nrow(geom)
sfc <- lapply(geom$coordinates, sf::st_linestring)
sfc <- sf::st_sfc(sfc)
})
## user system elapsed
## 4.553 0.036 5.027
str(sfc)
## sfc_LINESTRING of length 72943; first list element: XY [1:2, 1:2] 145 145 -38 -38
I have created a prototype package which returns the same sfc
object in approx 1 second
library(mongoGeo)
system.time({
con <- mg_connect(db = "roads", collection = "roads")
sfc <- mg_find_sfc(con)
})
## user system elapsed
## 1.215 0.119 1.546
str(sfc)
## sfc_LINESTRING of length 72943; first list element: XY [1:2, 1:2] 145 145 -38 -38
In principle yes, but it will be easier to give comments if you share your draft proposal; you'll find draft proposals of (successful) proposals for sf and stars in their respective repositories.
I'll share it in a couple of days when it's written; I've only just had the idea :)
It seems a little too specific (MongoDB only?) for RConsortium. Would it be extensible to other dbs as well?
The logic that does the conversion would be extensible because it's all about parsing GeoJSON.
However, communication with NoSQL databases are reliant on the specific drivers for those databases and the underlying data representation. e.g., MongoDB stores its data as BSON, and so provides the C/C++ API to communicate with it.
DynamoDB would have a different set of drivers, and FireGeo (or whatever they're going to name it) will have a different set again.
Doesn't MongoDB provide a WKB interface?
@SymbolixAU would you mind sharing a minimal reproducible example so that I'm able to compare my approach (GeoMongo) with yours (mongoGeo).
I've decided not to submit a proposal for this particular project, but, I think it's worth pursuing.
I'll share the mongo work soon, just fixing a few things first.
I took tim's "too specific" comment on board and thought about ways to make this more generic.
Ultimately it comes down to parsing GeoJSON, so I wanted to see if I could write a "fast" parser to add to what can already been done in sf
e.g. here and in geojsonio
.
So i've been playing with geojsonsf
. The Readme gives some examples and benchmarks.
I'm now going to see if I can make this even more generic, or at least handle the BSON objects returned by MongoDB.
Closing, feel free to re-open if necessary