[RFC] Lacks of worker methods
Clebal opened this issue · 12 comments
Precedents
Currently, there are methods in the worker for:
Load data:
- VIEWPORT_FEATURES: load viewport features from tiles layer
- LOAD_GEOJSON_FEATURES: load raw geojson data
- VIEWPORT_FEATURES_GEOJSON: filter the loaded geojson data
Consume data (grouped):
- VIEWPORT_FEATURES_FORMULA: group viewport filtered data for formula widget
- VIEWPORT_FEATURES_HISTOGRAM: same but for histogram
- VIEWPORT_FEATURES_CATEGORY: same but for category
- VIEWPORT_FEATURES_SCATTERPLOT: same but for scatterplot
Problems
There are some problems with that approach:
- There's no way to get raw data without groping for formula/histogram/etc.
- There's no way to get data without viewport filtering. It doesn't make sense with tileset but it can be very useful when managing geojson datasets in advanced use cases.
- VIEWPORT_FEATURES method do not specify what is the purpose, the same for VIEWPORT_FETURES_GEOJSON
Why do I need that?
- Currently, we only have a short amount of widgets. That's why I would consider adding a method for fetching raw data filtered by source's filters and viewport to create a custom model where I group the data by myself or even I can create a list widget.
- From my experience in professional services projects, sometimes we don't want to show data filtered by viewport. It's something very easy to implement with geojson layers.
- It isn't mandatory because it won't be used by user surely, but I think that it would be good to have a better description of what the method does.
Solutions
- Create worker method
GET_RAW_FEATURES
that receive source filters (also the prop proposed in task 2 to be able to filter by viewport). Also, create a model that execute this method. -
- Rename
VIEWPORT_FEATURES_FORMULA
asGET_FORMULA
(the same for other consume-data methods) - In the worker functions, add a prop to enable/disable viewport filtering. Enabled by default.
- If...
- ... viewport filtering prop is true, use
currentViewportFeatures
variable for grouping. - ... viewport filtering prop is false, use
currentGeoJSON
if defined, otherwise usecurrentViewportFeatures
variable for grouping.
- ... viewport filtering prop is true, use
- [Plus] Allow enable/disable viewport filtering in models.
- [Plus] Allow enable/disable viewport filtering in widgets.
- Rename
- Rename
VIEWPORT_FEATURES
asLOAD_TILES_VIEWPORT_FEATURES
andVIEWPORT_FEATURES_GEOJSON
asCALCULATE_GEOJSON_VIEWPORT_FEATURES
.
I'd try to avoid refactors/renames and new properties for now. I think it is more feasible to add the new worker method and the new function to get the features (solution 1).
We could have a getFeatures
function that accepts an object with three props:
- 'dataSource`. Data source id.
applyFilters
. Default:true
. Set tofalse
if you want to get all the features without applying column filters.onlyViewporfFeatures
: Default:true
. Set tofalse
if you want to get all the features. We need to throw an error if this property is false and we are working with a vector tiles source.
The function calls the new worker method to retrieve the [filtered] features and returns an array of features.
This function is similar to other calculations we make for widgets, but I don't know if the widgets package is the more appropriate for this function.
From this RFC, 3 PRs can be opened. I would tackle all in this RFC.
@borja-munoz About solution 1:
I agree with getFeatures
method exported from FeaturesModel
. But I don't agree about the props. I think we should keep the same structure used in other models:
dataSource
filters
: source filters.applyFilters
as boolean isn't useful because we don't have a way to get filters inside the model.onlyViewportFeatures
: I agree with the behavior that you explained, but I don't like the name. I would use:applyViewport
,useViewport
or justviewport
. I would use a shorter way because if in a feature, we want to make viewport filtering optional in our widget, a shorter option would be better.
About your doubts about where to position the function. I would treat this as a new model and the widget package I think is the indicated one. This function is useful for building new widgets.
We cannot have a GET_RAW_FEATURES
method in a worker because it would have a very high-performance cost. To get data from worker to the main thread you need to clone the data and you cannot be cloning data every time that you load new tiles. Working with workers is thinking to get aggregated data no the RAW data.
We're just adding the possibility, the library isn't going to use this in any widget, but it can makes sense in some use cases.
Very high performance cost? Well, it depends on how large the dataset is. That statement isn't true always. This is something that the user must measure and if web worker is a lack for the performance of the project, the user can try another way.
Sources:
https://surma.dev/things/is-postmessage-slow/
https://www.andygup.net/advanced-web-worker-performance/
Update:
Btw, GET_RAW_FEATURES
doesn't only get the data from tiles, it filters by viewport and applies source filters.
I'm going to try to resume all the use cases and the current state:
Working with geojson
- Get all raw features without fitlers -> you have the geojson so you have it
- Get all raw features filtered by viewport and other filters -> currently we would need to extract this information from worker. It could have a performance impact depending of the numbers of features
- Get filtered aggregated values for widgets -> is currently what the workers return to us
- Get filtered aggregated values for widgets but without spatial filter (viewport filter) -> I think that you could calculate it yourself importing functions from react core
Working with tiles
- Get all raw features without filters -> Has no sense for tilesets
- Get all raw features filtered by viewport and other filters -> currently we would need to extract this information from worker. We use tilesets for really big dateset so we are sure that we are going to have performance problems.
- Get filtered aggregated values for widgets -> is currently what the workers return to us
- Get filtered aggregated values for widgets but without spatial filter (viewport filter) -> Has no sense for tilesets
So if I am not missing any use case, we would really only need a method to return all the raw features (for geojson, filtered by vewport), right?
Yes @padawannn, those are the use cases. About the last use case of geojson, I would add the possibility to aggregate values without viewport filtering in worker. It's very easy to implement in C4R and you avoid a headache for the user.
Thanks a lot @Clebal!
I'm ok adding these two features (in separate PRs if possible):
- Return raw data (potentially for table widget), but I'd like to propose to include an order and limit to return the raw data.
- Aggregation without filters (potentially for add global widgets calculation for geojson).
The main idea why worker package was created is to provide better performance for widgets calculation, but this module shouldn't be publicly available for usage (the package is public, but not enhanced at our doc). Thus, I'm ok to extend this module as it's for our internal usage.
@Clebal, maybe I'm wrong, but I think what you're doing here is the first step of two larger features that we should start considering for the long term.
Feature 1: new Table widget.
Feature 2: global calculation for widgets (specifically you need it for geojson). I understand it'd be nice to have a source and been able to create global and viewport widgets. Is that the case? It makes totally sense for me since if you already have the data in the worker...
Feature 2 is harder than feature 1, and not sure how to do that yet, but can you confirm is this what you really want? (appart from flexibility)
Yes @alasarr, those are the two features:
-
Be able to fetch raw data filtered by the indicated filters and optionally viewport filter -> this can be used in a potential TableWidget. I don't talk here about a TableWidget because we need this feature in Site Selection ASAP and I don't want to delay it building a new TableWidget when we already have one in SS that works for our purpose but maybe it doesn't fit in C4R.
-
Make viewport filter optional in aggregation methods -> how you said, my intention here is to be able to create global and viewport widgets. This isn't something that we need right now in Site Selection, but I know that it's useful in other PS projects and it isn't difficult to implement.
Yes, I like your approach, first the easy, then the long term. Just wanted to make sure I was unterstanding your issue
I've been thinking in an approach to tackle fetching raw data and I my conclusion is pagination can work well in our use cases. I've read (see source article) that transmitting worker data in batches is a good solution to avoid out of memory and bottleneck problems. My reason for using pagination in the raw data fetching worker method are:
- The pagination can be useful in the future TableWidget, where all the data won't be needed at once.
- With pagination, advanced users can be able to fetch all the data in multiple requests, avoiding extra issues from using large datasets.
- Users can adapt the amount of data fetched from the worker using pagination depending on how their data is.
Source:
I like the approach 👍 , but I'd first complete the features without pagination and after that include the pagination. We need to avoid a very large PR, so let's do it in phases:
- Return raw data (potentially for table widget), but I'd like to propose to include an order and limit to return the raw data.
- Aggregation without filters (potentially for add global widgets calculation for geojson).
- Pagination.
It looks we've an agreement on this, so let's close this and move to implementation