[RFC] Lacks of worker methods

Question

[RFC] Lacks of worker methods

Clebal opened this issue 3 years ago · 12 comments

Clebal commented 3 years ago

Precedents

Currently, there are methods in the worker for:

Load data:

VIEWPORT_FEATURES: load viewport features from tiles layer
LOAD_GEOJSON_FEATURES: load raw geojson data
VIEWPORT_FEATURES_GEOJSON: filter the loaded geojson data

Consume data (grouped):

VIEWPORT_FEATURES_FORMULA: group viewport filtered data for formula widget
VIEWPORT_FEATURES_HISTOGRAM: same but for histogram
VIEWPORT_FEATURES_CATEGORY: same but for category
VIEWPORT_FEATURES_SCATTERPLOT: same but for scatterplot

Problems

There are some problems with that approach:

There's no way to get raw data without groping for formula/histogram/etc.
There's no way to get data without viewport filtering. It doesn't make sense with tileset but it can be very useful when managing geojson datasets in advanced use cases.
VIEWPORT_FEATURES method do not specify what is the purpose, the same for VIEWPORT_FETURES_GEOJSON

Why do I need that?

Currently, we only have a short amount of widgets. That's why I would consider adding a method for fetching raw data filtered by source's filters and viewport to create a custom model where I group the data by myself or even I can create a list widget.
From my experience in professional services projects, sometimes we don't want to show data filtered by viewport. It's something very easy to implement with geojson layers.
It isn't mandatory because it won't be used by user surely, but I think that it would be good to have a better description of what the method does.

Solutions

Create worker method GET_RAW_FEATURES that receive source filters (also the prop proposed in task 2 to be able to filter by viewport). Also, create a model that execute this method.
1. Rename VIEWPORT_FEATURES_FORMULA as GET_FORMULA (the same for other consume-data methods)
2. In the worker functions, add a prop to enable/disable viewport filtering. Enabled by default.
3. If...
  1. ... viewport filtering prop is true, use currentViewportFeatures variable for grouping.
  2. ... viewport filtering prop is false, use currentGeoJSON if defined, otherwise use currentViewportFeatures variable for grouping.
4. [Plus] Allow enable/disable viewport filtering in models.
5. [Plus] Allow enable/disable viewport filtering in widgets.
Rename VIEWPORT_FEATURES as LOAD_TILES_VIEWPORT_FEATURES and VIEWPORT_FEATURES_GEOJSON as CALCULATE_GEOJSON_VIEWPORT_FEATURES.

Answer 1 · 2021-10-14T10:44:43.000Z

I'd try to avoid refactors/renames and new properties for now. I think it is more feasible to add the new worker method and the new function to get the features (solution 1).

We could have a getFeatures function that accepts an object with three props:

'dataSource`. Data source id.
applyFilters. Default: true. Set to false if you want to get all the features without applying column filters.
onlyViewporfFeatures: Default: true. Set to false if you want to get all the features. We need to throw an error if this property is false and we are working with a vector tiles source.

The function calls the new worker method to retrieve the [filtered] features and returns an array of features.

This function is similar to other calculations we make for widgets, but I don't know if the widgets package is the more appropriate for this function.

Answer 2 · 2021-10-14T11:03:29.000Z

From this RFC, 3 PRs can be opened. I would tackle all in this RFC.

@borja-munoz About solution 1:

I agree with getFeatures method exported from FeaturesModel. But I don't agree about the props. I think we should keep the same structure used in other models:

dataSource
filters: source filters. applyFilters as boolean isn't useful because we don't have a way to get filters inside the model.
onlyViewportFeatures: I agree with the behavior that you explained, but I don't like the name. I would use: applyViewport, useViewport or just viewport. I would use a shorter way because if in a feature, we want to make viewport filtering optional in our widget, a shorter option would be better.

About your doubts about where to position the function. I would treat this as a new model and the widget package I think is the indicated one. This function is useful for building new widgets.

Answer 3 · 2021-10-15T07:28:03.000Z

We cannot have a GET_RAW_FEATURES method in a worker because it would have a very high-performance cost. To get data from worker to the main thread you need to clone the data and you cannot be cloning data every time that you load new tiles. Working with workers is thinking to get aggregated data no the RAW data.

Answer 4 · 2021-10-15T07:55:06.000Z

We're just adding the possibility, the library isn't going to use this in any widget, but it can makes sense in some use cases.

Very high performance cost? Well, it depends on how large the dataset is. That statement isn't true always. This is something that the user must measure and if web worker is a lack for the performance of the project, the user can try another way.

Sources:
https://surma.dev/things/is-postmessage-slow/
https://www.andygup.net/advanced-web-worker-performance/

Update:

Btw, GET_RAW_FEATURES doesn't only get the data from tiles, it filters by viewport and applies source filters.

Answer 5 · 2021-10-18T07:05:26.000Z

I'm going to try to resume all the use cases and the current state:

Working with geojson

Get all raw features without fitlers -> you have the geojson so you have it
Get all raw features filtered by viewport and other filters -> currently we would need to extract this information from worker. It could have a performance impact depending of the numbers of features
Get filtered aggregated values for widgets -> is currently what the workers return to us
Get filtered aggregated values for widgets but without spatial filter (viewport filter) -> I think that you could calculate it yourself importing functions from react core

Working with tiles

Get all raw features without filters -> Has no sense for tilesets
Get all raw features filtered by viewport and other filters -> currently we would need to extract this information from worker. We use tilesets for really big dateset so we are sure that we are going to have performance problems.
Get filtered aggregated values for widgets -> is currently what the workers return to us
Get filtered aggregated values for widgets but without spatial filter (viewport filter) -> Has no sense for tilesets

So if I am not missing any use case, we would really only need a method to return all the raw features (for geojson, filtered by vewport), right?

Answer 6 · 2021-10-18T08:30:05.000Z

Yes @padawannn, those are the use cases. About the last use case of geojson, I would add the possibility to aggregate values without viewport filtering in worker. It's very easy to implement in C4R and you avoid a headache for the user.

Answer 7 · 2021-10-18T14:22:00.000Z

Thanks a lot @Clebal!

I'm ok adding these two features (in separate PRs if possible):

Return raw data (potentially for table widget), but I'd like to propose to include an order and limit to return the raw data.
Aggregation without filters (potentially for add global widgets calculation for geojson).

The main idea why worker package was created is to provide better performance for widgets calculation, but this module shouldn't be publicly available for usage (the package is public, but not enhanced at our doc). Thus, I'm ok to extend this module as it's for our internal usage.

@Clebal, maybe I'm wrong, but I think what you're doing here is the first step of two larger features that we should start considering for the long term.

Feature 1: new Table widget.
Feature 2: global calculation for widgets (specifically you need it for geojson). I understand it'd be nice to have a source and been able to create global and viewport widgets. Is that the case? It makes totally sense for me since if you already have the data in the worker...

Feature 2 is harder than feature 1, and not sure how to do that yet, but can you confirm is this what you really want? (appart from flexibility)

Answer 8 · 2021-10-18T14:44:40.000Z

Yes @alasarr, those are the two features:

Be able to fetch raw data filtered by the indicated filters and optionally viewport filter -> this can be used in a potential TableWidget. I don't talk here about a TableWidget because we need this feature in Site Selection ASAP and I don't want to delay it building a new TableWidget when we already have one in SS that works for our purpose but maybe it doesn't fit in C4R.
Make viewport filter optional in aggregation methods -> how you said, my intention here is to be able to create global and viewport widgets. This isn't something that we need right now in Site Selection, but I know that it's useful in other PS projects and it isn't difficult to implement.

Answer 9 · 2021-10-18T14:53:50.000Z

Yes, I like your approach, first the easy, then the long term. Just wanted to make sure I was unterstanding your issue

Answer 10 · 2021-11-02T07:21:54.000Z

I've been thinking in an approach to tackle fetching raw data and I my conclusion is pagination can work well in our use cases. I've read (see source article) that transmitting worker data in batches is a good solution to avoid out of memory and bottleneck problems. My reason for using pagination in the raw data fetching worker method are:

The pagination can be useful in the future TableWidget, where all the data won't be needed at once.
With pagination, advanced users can be able to fetch all the data in multiple requests, avoiding extra issues from using large datasets.
Users can adapt the amount of data fetched from the worker using pagination depending on how their data is.

Source:

Communicating Large Objects with Web Workers in javascript

Answer 11 · 2021-11-02T07:29:45.000Z

I like the approach 👍 , but I'd first complete the features without pagination and after that include the pagination. We need to avoid a very large PR, so let's do it in phases:

Return raw data (potentially for table widget), but I'd like to propose to include an order and limit to return the raw data.
Aggregation without filters (potentially for add global widgets calculation for geojson).
Pagination.

Answer 12 · 2021-11-02T08:04:01.000Z

It looks we've an agreement on this, so let's close this and move to implementation