npucino/sandpyper

[JOSS REVIEW] Notebook #2 incorrect label_k values?

Closed this issue · 5 comments

Comments are for openjournals/joss-reviews#3666 (comment).

Following 2 - Profiles extraction, unsupervised sand labelling and cleaning.ipynb I tried plotting the given label_k values for leo_20180606, with the profile dataframe I saved but they don't look quite right (see plot below). Can you confirm the values given in water_dict, no_sand_dict etc in the notebook are correct? Or is the purpose of the label_correction.gpkg to fix this? I'm also wondering if there is some random state that results in different label numbers if you rerun the kmeans clustering?

"In the St. Leonards survey of the 13th July 2018, the label_k 1,3 and 4 are sand, while the label_k 2 and 6 are water.\n",
"In Marengo, the 20th September 2018, no label_k represents sand while 1,2,3 and 4 are water.\n",
"\n",
"Here below are reported the label dictionaries of the demo data."

WQyEhjdWsb

Hi @chrisleaman , I figured this out.
Basically, this cell causes the issue:

# Based on our observations on a dataset comprising 87 surveys, 10 clusters (k=10) is generally a good tradeoff.

opt_k={'leo_20180606': 10,
 'leo_20180713': 10,
 'leo_20180920': 10,
 'leo_20190211': 10,
 'leo_20190328': 10,
 'leo_20190731': 10,
 'mar_20180601': 10,
 'mar_20180621': 10,
 'mar_20180727': 10,
 'mar_20180925': 10,
 'mar_20181113': 10,
 'mar_20181211': 10,
 'mar_20190205': 10,
 'mar_20190313': 10,
 'mar_20190516': 10}

The above cell, I will render it in Markdown rather than code so that other people don't fall in this trap.

Basically, when following the notebook, running
opt_k=get_opt_k(sil_df, sigma=0 ), computes a sub-optimal number of clusters to use for KMeans.
These values are what it should be used.

I added the above cell just to show how one could customise the number of clusters in each survey to use, and I made the point that 10 clusters is a good number of clusters based on our experience, that's why surveys are set to 10.

I replicated your image by running that cell, therefore using k = 10 in all surveys, and by using rule-based symbology in Qgis for survey in St. Leonards 20180606 ( note that when using rule-based symbology, if a rule is not satisfied it doesn't break the classification just no points get rendered). Using the opt_k function, for that survey the suboptimal k is 11.

In fact, I think your images doesn't actually have label_k = 10, you should have only 10 labels starting from 0 so the max k should be 9. Can you double check please?

If you don't run that cell and use k=11 for that survey, this is what you get:
image

Then, you will be able to actually run P.cleanit() (fixing issue #8 ), and obtain this:

image

Can you please confirm that was the issue by just skipping that misleading code cell??
Thanks!

Hi @npucino - yep, omitting that cell allows me to run P.cleanit() fine. However, when I go to check the results of the .cleanit() function, the labels don't seem to be updated:

Before/after P.cleanit() images

Before cleanit
oqCOfbk8XL

After cleanit
7mZBYP0EPt

I can tell P.cleanit() is actually doing something though, you can see it removes some landside points outside the shoreline mask in the 'after' image. I'm running P.profiles.to_csv('profiles-cleaned.csv') after cleaning and importing that file into QGIS to check the results - is that the best way to do it or are the cleaning results stored somewhere else?


Some debugging output

P.cleanit() seems to run correctly, although I get a bunch of future warnings which shouldn't affect the results:

`P.cleanit()` output
Reclassifying dataset with the provided dictionaries.
Label corrections provided in CRS: epsg:32754
Fine tuning in leo.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:422: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
  for feature in features_lst:
100%
6/6 [00:00<00:00, 18.43it/s]
Fine-tuning label_k 3 to no_sand in leo-20180606, found 127 pts.
Fine-tuning label_k 8 to no_sand in leo-20180606, found 73 pts.
Fine-tuning label_k 5 to no_sand in leo-20180713, found 56 pts.
Fine-tuning label_k 6 to no_sand in leo-20180713, found 7 pts.
Fine-tuning label_k 1 to no_sand in leo-20180920, found 89 pts.
Fine-tuning label_k 2 to sand in leo-20180920, found 9 pts.
Fine-tuning label_k 2 to veg in leo-20180920, found 31 pts.
Fine-tuning label_k 6 to sand in leo-20180920, found 17 pts.
Fine-tuning label_k 2 to veg in leo-20190211, found 2 pts.
Fine-tuning label_k 6 to veg in leo-20190211, found 29 pts.
Fine-tuning label_k 6 to no_sand in leo-20190211, found 2 pts.
Fine-tuning label_k 6 to veg in leo-20190211, found 6 pts.
Fine-tuning label_k 6 to sand in leo-20190211, found 28 pts.
Fine-tuning label_k 7 to no_sand in leo-20190211, found 41 pts.
Fine-tuning label_k 0 to sand in leo-20190328, found 6 pts.
Fine-tuning label_k 0 to no_sand in leo-20190328, found 52 pts.
Fine-tuning label_k 1 to sand in leo-20190328, found 21 pts.
Fine-tuning label_k 1 to no_sand in leo-20190328, found 2 pts.
Fine-tuning label_k 2 to sand in leo-20190328, found 16 pts.
Fine-tuning label_k 4 to sand in leo-20190328, found 25 pts.
Fine-tuning label_k 5 to veg in leo-20190328, found 3 pts.
Fine-tuning label_k 1 to sand in leo-20190731, found 5 pts.
Fine-tuning label_k 3 to no_sand in leo-20190731, found 11 pts.
Fine-tuning label_k 7 to no_sand in leo-20190731, found 72 pts.
Fine-tuning label_k 6 to sand in leo-20190731, found 25 pts.
Fine-tuning label_k 6 to veg in leo-20190731, found 42 pts.
Fine tuning in mar.
100%
6/6 [00:00<00:00, 25.41it/s]
Fine-tuning label_k 3 to water in mar-20180601, found 163 pts.
Fine-tuning label_k 7 to no_sand in mar-20180601, found 15 pts.
Fine-tuning label_k 7 to sand in mar-20180601, found 53 pts.
Fine-tuning label_k 1 to sand in mar-20180621, found 48 pts.
Fine-tuning label_k 3 to sand in mar-20180621, found 42 pts.
Fine-tuning label_k 2 to water in mar-20180727, found 98 pts.
Fine-tuning label_k 4 to sand in mar-20180727, found 29 pts.
Fine-tuning label_k 2 to no_sand in mar-20181211, found 45 pts.
Fine-tuning label_k 1 to water in mar-20181211, found 149 pts.
Fine-tuning label_k 2 to water in mar-20190205, found 304 pts.
Fine-tuning label_k 3 to no_sand in mar-20190205, found 23 pts.
Fine-tuning label_k 2 to water in mar-20190313, found 100 pts.
Fine-tuning label_k 5 to no_sand in mar-20190313, found 77 pts.
label_corrections_path: C:\Users\Chris\Desktop\sandpyper\examples\test_data\clean\label_corrections.gpkg
watermask  provided in CRS: epsg:32754
Applying watermasks cleaning.
Watermasking in mar.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:422: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
  for feature in features_lst:
100%
9/9 [00:00<00:00, 9.28it/s]
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 750 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 532 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 710 pts overlapping provided watermasks.
Setting to water 532 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 599 pts overlapping provided watermasks.
Setting to water 557 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 339 pts overlapping provided watermasks.
Setting to water 133 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 441 pts overlapping provided watermasks.
Watermasking in leo.
100%
6/6 [00:01<00:00, 4.93it/s]
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 652 pts overlapping provided watermasks.
Setting to water 555 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 638 pts overlapping provided watermasks.
Setting to water 627 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 583 pts overlapping provided watermasks.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Setting to water 643 pts overlapping provided watermasks.
shoremask  provided in CRS: epsg:32754
Applying shoremasks cleaning.
Shoremasking in mar.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:422: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
  for feature in features_lst:
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\geodataframe.py:828: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  result = super(GeoDataFrame, self).__getitem__(key)
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Removing 3621 pts falling outside provided shore polygones.
Shoremasking in leo.
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\geopandas\base.py:39: UserWarning: The indices of the two GeoSeries are different.
  warn("The indices of the two GeoSeries are different.")
C:\Users\Chris\Anaconda3\envs\sandpyper\lib\site-packages\pyproj\crs\crs.py:131: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  in_crs_string = _prepare_from_proj_string(in_crs_string)
Removing 2358 pts falling outside provided shore polygones.
['polygon finetuning', 'watermasking', 'shoremasking'] completed.

These are my profile .csv files before and after running P.cleanit():

Hi @chrisleaman , I checked your profile-cleaned file and I see the confusion now!

The label_k field is not updated, rather, P.cleanit() adds a column named pt_class.
That column holds the final point classes, cleaned, in this case water, sand, veg, no_sand.

here is the pt_class column displayed of your profile-cleaned.csv file!

image

The changes you see there is exactly what you noted, the result of the shoremask, which is simply a clipping mask that discards all observation outside of the area of interest, in this case, landward.

Thanks for raising this issue, this is critical information I need to specify in the documentation and in the notebooks!
The future warning are annoying but do not affect the results.
In the future I will update all in a way no warnings are issued.

note: I actually created the classification dictionaries looking at full resolution imagery (2.5cm), but in order to speed up testing and load test data in GitHub, I needed to downsample imagery to 1m. That is why the classification might seem not perfect on this test workflow. Morever, swash is included into the water class as Structure from Motion in the swash is really bad and not reliable for sand volumetric computations.
CHeers!

Thanks @npucino, everything makes sense now - I didn't realise that extra column had been added! I think an extra sentence or two in the notebook just stating the additional pt_class column is created would be very helpful.

Re: performance issues, if you haven't already, I'd recommend looking into using https://github.com/pyutils/line_profiler to see exactly where your code is slow. It identifies which lines in your code are taking the most time, so you can focus on increasing performance just on those lines. Sometimes the results are surprising and you can get some easy wins using this!

Feel free to close this issue when you're ready 😊

Thanks for the suggestion @chrisleaman , forked now!
This afternoon I will take care of the rest of the issues as well.

Cheers!