GispoCoding/eis_toolkit

Distance computation criteria

nmaarnio opened this issue · 8 comments

If I have understood some recent discussions correctly, it is common to compute distances to features matching some criteria, similar as in distance_to_anomaly. Do you think we should modify distance_computation to enable selection criteria, or do users need to subset their dataset using the criteria before using the tool @nialov (and others)?

So the features would have attributes which would be used to filter the features? I would personally delegate these kind of simple operations to the users. It is just a matter of selecting by attribute in the GIS-software and exporting the selected data to a new dataset.

Building in difference methods for distance computation modification would be more useful than that, which as suggested is more a user problem.

Our aim in the QGIS plugin is to offer convenience and tailored tools for users, which in many cases is achieved by parameterization and/or grouping handful of small operations into one. This is how Distance to anomaly was created. I've gotten the impression that the following workflow is a common one to prepare a proxy / preprocess data for modeling:

  1. Select features based on criteria (for example, select all point features that have Fe concentration value (a column) over 100)
  2. Calculate distance to these features up until a certain distance (for example 1km. All locations further than this are nodata in the resulting layer).
  3. Normalize the result layer to range [0, 1]
  4. Invert the values, so pixels very close to the matched point features get value 1 and at the maximum distance of step 2 pixel values are 0 (and after that nodata).

This workflow does not necessarily need to be one tool in toolkit, of course. And as I am not a geologist or conduct any MPM myself, I don't know how widely all these steps are applied instead of only executing distance_computation as it is now in the toolkit, and therefore am not the right person to make calls for grouping up / parameterizing processes.

If we don't want to modify the toolkit function, we can run these processes in sequence under the hood in the EIS Wizard based on user selections or make the user run each step at a time – either way multiple intermediary rasters are produced. I am not sure how relevant implications this would have on performance or disk space in a MPM project.

We just had a meeting where we discussed the distance computation tool among some other things. It was decided that at least the scaling steps (3 and 4) will be handled separately. However, the proxy preparation view in the plugin will include feature selection based on criteria and optional setting for maximum distance (all values beyond that will be the maximum distance value, not nodata like I said in the previous comment) (so 1 and 2). As I see it, we have a handful of options:

  1. Include these parameters in the distance_computation function in the toolkit
  2. Include these parameters in the distance_computation processing algorithm and CLI function, but not the toolkit function. This would mean that the CLI function responsible for distance computation would perform the feature selection and maximum value setting as additional operations, if the user wishes
  3. Include these parameters only in the proxy preparation window in the EIS Wizard. Based on parameter settings, several processes are performed in sequence and intermediary results are saved only in memory.
  4. Include the three processes only in the proxy preparation window in the EIS Wizard as separate processes. The user needs to execute each step at a time and intermediary layers are always saved and loaded to the project.

I would say defaulting to the option 4 is safe and might feel a modular approach, but in a project where many proxies are prepared and intermediary results are not useful, the user might end up with tons of files. I've already heard a lot of worries and complaints about difficult and troublesome file management in many GIS/MPM software. This could be regarded as a "user problem", but I think this is a decision worth considering since a large part of raw data in MPM project is put through the distance computation workflow.

Since you implemented the distance_computation tool @nialov , you can at least be the judge whether option 1 is selected or not, but any further comments on this matter are welcome (@msmiyels do you have a say on this?). Some choice should be made within a couple weeks so that the beta release of the plugin will support distance computation.

As I see it it, this is just extending the functionality of distance_computation which is of course fine to me! My previous comment on letting users do this kind of work is just related to the competition between end-user convenience and our time allocation for creating enough convenience so that it exceeds the base QGIS functionality in usefulness. If creating it is a priority and this kind of workflow is very common then I have no objections! Option 1. sounds fine to me as I understand it.

Would you have time to add a max_distance parameter to the tool then @nialov ? I think that could be a good addition. The feature selection based on criteria would introduce multiple new parameters to the tool, so maybe it shouldn't be added to this tool. Perhaps it could be its own small utility function that can be utilized inside CLI functions, if the QGIS algorithm allows subsetting on the fly.

At the earliest I can get it done around middle of May, sadly. If someone else has time/interest I have no problem with them implementing it.

Ok, good to know. @Mtk112 and/or I will then proceed to implement this