interleaved geometry
dcooley opened this issue · 13 comments
coordinates stored in a single vector
[x,y,z,x,y,z,x,y,z,...]
And associated meta-data describing start indices, stride length
References & discussion
TODO
- interleave geometry coordinates
- follow design of flat lists / geo-arrow
- polygons need inner-ring start_indices
Prototype
library(sfheaders)
sf <- mapdeck::roads
it <- sfheaders:::rcpp_interleave_sf( sf )
str( it )
List of 6
$ coordinates : num [1:115514] 145 -37.8 145 -37.8 145 ...
$ start_indices : int [1:18286] 0 20 46 62 70 74 81 91 115 125 ...
$ n_coordinates : int [1:18286] 20 26 16 8 4 7 10 24 10 10 ...
$ total_coordinates: num 57757
$ stride : int 2
$ data :'data.frame': 18286 obs. of 15 variables:
..$ EZI_RDNAME: chr [1:18286] "MAIN YARRA TRAIL" "YARRA BOULEVARD" "UNNAMED" "YARRA-CITYLINK OUT RAMP ON" ...
..$ FQID : num [1:18286] 5516 1347 5516 3316 3470 ...
..$ FROM_UFI : num [1:18286] 16141399 2319828 16141401 39157640 2319921 ...
..$ FTYPE_CODE: chr [1:18286] "road" "road" "road" "road" ...
..$ LEFT_LOC : chr [1:18286] "BURNLEY" "BURNLEY" "BURNLEY" "BURNLEY" ...
..$ PFI : num [1:18286] 11815534 5684911 11749283 5684402 5684635 ...
..$ RD_NAME1 : chr [1:18286] "CAPITAL CITY" NA NA NA ...
..$ RD_NAME2 : chr [1:18286] NA NA NA NA ...
..$ RD_TYPE1 : chr [1:18286] "TRAIL" NA NA NA ...
..$ RD_TYPE2 : chr [1:18286] NA NA NA NA ...
..$ RIGHT_LOC : chr [1:18286] "BURNLEY" "BURNLEY" "BURNLEY" "BURNLEY" ...
..$ ROAD_NAME : chr [1:18286] "MAIN YARRA" "YARRA" "UNNAMED" "YARRA-CITYLINK OUT" ...
..$ ROAD_TYPE : chr [1:18286] "TRAIL" "BOULEVARD" NA "RAMP" ...
..$ TO_UFI : num [1:18286] 39357869 39364350 2319308 39157656 45130685 ...
..$ UFI : num [1:18286] 39352041 39364294 16140773 39157330 45113563 ...
some benchmarks
library(microbenchmark)
## Need to get this speed down.
microbenchmark(
df_T = { sfheaders::sf_to_df( mapdeck::roads, fill = T ) },
df_F = { sfheaders::sf_to_df( mapdeck::roads, fill = F ) },
it = { sfheaders:::rcpp_interleave_sf( mapdeck::roads ) },
times = 10
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# df_T 29.93938 30.87460 32.42612 32.43507 34.31048 34.82401 10
# df_F 23.60596 24.32471 28.00786 28.39502 29.48871 34.00938 10
# it 28.92496 30.99812 34.33903 35.01555 36.09556 40.84551 10
## interleaving benchmarks
library(Rcpp)
cppFunction(
code = '
Rcpp::NumericVector interleave( Rcpp::NumericMatrix& mat ) {
R_xlen_t n_col = mat.ncol();
R_xlen_t n_row = mat.nrow();
Rcpp::NumericVector res( n_col * n_row );
R_xlen_t i, j;
R_xlen_t position_counter = 0;
for( i = 0; i < n_row; ++i ) {
for( j = 0; j < n_col; ++j ) {
res[ position_counter ] = mat( i, j );
position_counter = position_counter + 1;
}
}
return res;
}
'
)
cppFunction(
code = '
Rcpp::NumericVector interleave_t( Rcpp::NumericMatrix& mat ) {
Rcpp::NumericMatrix m2 = Rcpp::transpose( mat );
m2.attr("dim") = R_NilValue;
return m2;
}
'
)
cppFunction(
code = '
Rcpp::NumericVector interleave_cols( Rcpp::NumericMatrix& mat ) {
Rcpp::NumericVector col1 = mat( Rcpp::_, 0 );
Rcpp::NumericVector col2 = mat( Rcpp::_, 1 );
R_xlen_t n_row = mat.nrow();
Rcpp::NumericVector res( n_row + n_row );
R_xlen_t i;
R_xlen_t counter = 0;
for( i = 0; i < n_row; ++i ) {
res[ counter ] = col1[ i ];
res[ counter + 1 ] = col2[ i ];
counter = counter + 1;
}
return res;
}
'
)
## https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Matrix.h#L405
cppFunction(
code = '
Rcpp::NumericVector interleave_impl( Rcpp::NumericMatrix& mat ) {
R_xlen_t nrow = mat.nrow(), ncol = mat.ncol();
R_xlen_t len = nrow * ncol;
R_xlen_t len2 = len - 1;
Rcpp::NumericVector res( len );
R_xlen_t i, j;
for( i = 0, j = 0; i < len; ++i, j += nrow ) {
if (j > len2) j -= len2;
res[i] = mat[j];
}
return res;
}
'
)
m <- matrix(rnorm(3e1), ncol = 3)
as.numeric( t(m) )
interleave_t( m )
interleave_impl( m )
library(microbenchmark)
m <- matrix(rnorm(3e7), ncol = 3)
microbenchmark(
loop = { interleave( m ) },
tran = { interleave_t( m ) },
cols = { interleave_cols( m ) },
impl = { interleave_impl( m ) },
times = 25
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# loop 55.18919 64.50406 70.85069 67.40234 71.77256 130.16606 25
# tran 51.28294 52.17571 60.82680 62.26534 68.09855 69.24250 25
# cols 60.38587 61.41036 64.24787 62.21885 67.20412 70.19668 25
# impl 48.87238 50.17013 58.46313 59.21523 65.89245 68.35909 25
It's been years since I've used R. If you have suggestions/comments about creating sf
data structures from different types of Arrow structures, we'd love to hear.
Cheers @kylebarron. From reading all the discussions I think the conversion from Arrow to sf
will be pretty straight forward, so I'm trying to align these "interleaving" functions with what is being discussed.
And for what it's worth, this comment
One of the core concepts of Arrow is that it is a columnar layout, so which means all values in a single column (here: all geometries of one column) are stored in one contiguous array.
Is what I was thinking about when I asked this question
is your proposed size a single value, or an array? That is, is it possible (or does it make sense?) to encode two linestrings with different dimensions
(And like you I'm coming at this from a Deck.gl performance perspective)
Yes, I think I was a little confused. I didn't understand that each key of the struct was stored as a separate contiguous array. So the abstraction is that each struct is an independent record, but internally each value of the struct is an Array (possibly an Array of Arrays of Arrays ...). So the size is essentially an array internally that varies across rows, but not within a single record.
Anyways, I think there's still some good discussion to be had, so I wouldn't spend too much effort implementing any one proposal yet.
Anyways, I think there's still some good discussion to be had, so I wouldn't spend too much effort implementing any one proposal yet.
yeah absolutely. I'm not "productionising" anything yet, just setting up some processes ready to hit whatever structure is needed.
How could you encode structure in this interleave form? (I guess it's too early to say for sfheaders yeah)
There's no mapping between $data
and $start_indices
and I'm drawn to this comment: geoarrow/geoarrow#4 the 'into parts / into rings / into values' stuff
The discussion there leaves me wondering what to do about triangles, you really don't want expanded interleaved versions of those, and it seems missing when you can have a packed 'values' array and ' indexing values' (it's missing from the SF standard like a crime from long dead ancestors) so I wonder if you've encountered the topic elsewhere?
How could you encode structure in this interleave form
what do you mean by "structure" exactly?
There's no mapping between $data and $start_indices
The index of the [start_indices]
array is the row-index of [data]
oh, sorry - quite right - I mean for multi-part types, is it just because it's LINESTRING above? - there's only run-length no parts- hierarchy (that's all I mean by structure)
oh I see. Yeah I've not fully thought about it yet. Was going to come back to this once geometries
and sfheaders
are updated. There's a solution out there somewhere...
Cool, it's interesting - so refreshing to see such discussion, and I was pretty dismissive at first, I was exploring these things long ago and it seems the important stuff is not where I expected
The crux is in how we store indexed coordinates! I'm sure of it, we have to have triangles and segments in dense form and this seems so close to that now
we have to have triangles
I should mention, my v0.0.1 plan of {interleave}
is to create (only) POINTs, LINEs and Triangles (via ear-cutting) from any geometry (hence {geometries}
), as these will feed directly into {mapdeck}
.
Oh and of course, that means run length is enough - no need for part/hole nonsense when you only have foreground pieces. That's worth dwelling on ... seems like good convergence!