[JOSS REVIEW] Example notebook minor suggestions
Closed this issue · 3 comments
Hi @npucino,
Great work in creating these example notebooks, it was pretty easy for me to work through all three notebooks and understand how everything in the package works.
My comments below are mainly minor suggestions about providing some additional explaination for anyone using these notebooks. Comments are for openjournals/joss-reviews#3666 (comment). I've created two other issues for some trickier problems (#7, #8).
1 - Introduction and data preparation.ipynb
-
Can you define the term "LoD" before you use it here? I had to scroll down a bit before I got the definition.
-
When talking about the demo data, I think it should be more clear that these are the inputs that sandpyper needs (and not the output that sandpyper generates). It would also be helpful to briefly describe how each input is typically created, i.e. orthos and dems are stitched together using photogrammetry software (Pix4D), while transect and mask files are manually created in a GIS software (QGis, ARCGIS). You've provided examples later in the notebook, it'd be good just add a summary sentence or two here as well.
-
We need to change the path of a lot of path varibles, depending where we've saved the test_data. I suggest defining the parent path first and constructing the other paths based on this. Something like:
from pathlib import Path
test_data_folder = 'C:\Users\Chris\Desktop\sandpyper\examples\test_data'
ortho_path=Path(test_data_folder,"\orthos_1m\leo_20180920_ortho_resampled_1m.tif")
watermasks_path=Path(test_data_folder,"\clean\watermasks.gpkg")
shoremasks_path=Path(test_data_folder,"\clean\shoremasks.gpkg")
label_corr_path=Path(test_data_folder,"\clean\label_corrections.gpkg")
transect_path=Path(test_data_folder,"\transects\leo_transects.gpkg")
transect_lod_path=Path(test_data_folder,"\lod_transects\leo_lod_transects.gpkg")
sandpyper/examples/1 - Introduction and data preparation.ipynb
Lines 1164 to 1169 in ce542c6
-
Can you double check the leo features in the label corrections file (
examples/test_data/clean/label_corrections.gpkg
) are in the correct location? Looks like some CRS mistake when I try load it in QGIS. The mar features in the correct location though.
-
I think an additional sentence explaining why points are being classified here would be good. Also update the reference to example notebook 2.
2 - Profiles extraction, unsupervised sand labelling and cleaning.ipynb
-
Again, easier to use the
pathlib.Path
approach here to define the parent folder rather than changing each variable.
-
Add the command to export the dataframe to csv:
P.profiles.to_csv('profiles.csv')
3 - Profile dynamics.ipynb
(I couldn't link to line numbers in this file - perhaps the notebook is too big for github?)
-
Section: "Plot: LoD normality check": The normality check plots show that the distributions fail the normality checks. Can you talk about what (if any) implications this has?
-
Section: "Plot: multi-scale MECS and volumetrics": The plots for
D.plot_transect_mecs(location='mar',tr_id=10)
andD.plot_transect_mecs(location='mar', lod=D.lod_df, tr_id=10)
seem fairly similar (though I see some variation in the shaded area). Could you add a couple of sentences discussing the difference between using LoD and not?
Thanks @chrisleaman for this review! Very useful and constructive suggestions that I will implement very soon.
For now I want to just reply this:
Can you double check the leo features in the label corrections file (examples/test_data/clean/label_corrections.gpkg) are in the correct location? Looks like some CRS mistake when I try load it in QGIS. The mar features in the correct location though.
Unfortunately, storing all locations (multiple locations and multiple CRSs) into one single Geopandas.Geodataframe allow only 1 CRS to use. Thus, I digitised all correction polygons with EPSG 32754, which is the correct one for Marengo but the wrong one for St. Leonards (EPSG 32755). Unfortunately, we would need either:
- one geodataframe per location with the actual location CRS
- a geodataframe which stores multiple CRSs, say, some rows with EPSG 32754 and others EPSG 32755, which is not possible to my knowledge.
The CRS dictionary takes care of the imagery coordinates and all overlay operations that need matching CRSs, but the single CRS geodataframe issue is a pitfall of current Geopandas.
Unless I am not aware of a multi CRS geodataframe variant!
I must make clear this somewhere, very important, thanks for reporting this!
Here there are the suggestions implemented, currently in the conda_only branch, where I am concentrating all JOSS modifications.
Sorry for the rudimental images snapshots, I don't know how to reference notebook lines as links and couldn't find a solution in 5 minutes of googling.
1.1 Earlier definition of LoD
I added its definition straight directly in the introduction.
1.2 Better definition of input data
I added a better explaination of each input, making clear that these are inputs and not what Sandpyper produces.
1.3 & 2.1 Implementation of pathlib Path to manage paths
I used the strategy you suggested and it is way better.
1.4 CRS misplacement
Explained here: #9 (comment)
1.5 additional explanation of point classification importance and updated link to notebook 2
I wrote an extra paragraph where I introduce the challenges and the why I coded this way (for now).
And updated the link to NB-2.
2.2 save to .csv command
3.1 Normality check importance and consequences explanation
I better explained why it is important to test the normality of the error distributions and the consequences on the decision of most suitable error metric, with reference to one of the most important literature that best explain this concept for this type of data.
3.2 Plot transect mecs with and without LoD better explanation
I added a few words on the importance of using LoD despite the difference being small.
Cheers
Hi @npucino, great work with these changes! I really like the additional guidance and discussion you've provided, especially in notebook 3. It'll be especially helpful for those that aren't familiar with these LoD concepts (like me!).
This issue was addressed with the following commits: