ACHMartin/seastar_project

Housekeeping the seastar_project

Opened this issue · 14 comments

I propose to list below all the elements we collectively thinks need to be removed (or kept) in the seastar_project tree.

1/ Notebooks
2/ config/config.ini/seastarx_config.ini
3/ Hidden files

Concerning 1/Notebooks,

I do think there is value of keeping some notebooks, a/ some for a nice start for new people, and b/ some to get inspiration from what has already been done within the project and sharing it between us. We tend to prefer to test our code with notebooks instead of "pytest", there is probably value to keep them somewhere as long as they are not integrated as a proper unit test.

A potential structure for the notebooks could be:
notebooks/OSCAR
notebooks/tests
notebooks/gmfs

2/ config/config.ini/seastarx_config.ini

I see in the README, it is recommended to modified the "seastarx_config.ini" file. @elemerle did you use it?
Is the "config.ini" still of use?
I see "config" is an empty file, do you confirm we can delete it?

Could we keep a single config file? Should we change the structure of it?

3/ Hidden files

I have some hidden files appearing on my PyCharm (.ipynb_checkpoints, .gitattributes, .gitignore, .readthedocs.yaml). I don't know if it is something I/we should configure with git/pycharm.
Do you get the same files?

On a side note, there is the "pyproject.toml", does anybody know where does it come from?

I agree that the notebooks are useful but maybe you can make some cleaning among the several files because there is the v1, v1.2 and v2. So maybe one file would be enough.

For the OSCAR folder, we can have these 3 folders:

  • processing
  • tests
  • validation
    Knowing that over time the processing notebooks will become python files.

About the file *.ini I don't uuse them. I gess you can delete them.

I also found those hidden files in my git repository: .gitattributes, .gitignore, .readthedocs.yaml. I don't know why we have them but I can ask Javier if they are important. As far as they are hidden I guest we can just keep them.

I agree that the notebooks are useful but maybe you can make some cleaning among the several files because there is the v1, v1.2 and v2. So maybe one file would be enough.

Agreed

For the OSCAR folder, we can have these 3 folders:

* processing

* tests

* validation
  Knowing that over time the processing notebooks will become python files.

Should notebooks for OSCAR be in notebooks/OSCAR/tests or in notebooks/tests or moved to a proper pytest, I would tend to prefer the later versions

Change validation to analysis? All the work on the star pattern are not properly validation, but validation might be part of the analysis.

About the file *.ini I don't uuse them. I gess you can delete them.

How do you deal with the path of the data? It was the objective of the config file, but I think only David used it, because I didn't look at OSCAR data since then, and you probably do something else. In theory, I do think it might be a good approach.

I also found those hidden files in my git repository: .gitattributes, .gitignore, .readthedocs.yaml. I don't know why we have them but I can ask Javier if they are important. As far as they are hidden I guest we can just keep them.

They are not hidden on my Windows... but I am happy to keep it. I would be happy to get Javier view

"Change validation to analysis? All the work on the star pattern are not properly validation, but validation might be part of the analysis."

Okay to change it to analysis

"They are not hidden on my Windows... but I am happy to keep it. I would be happy to get Javier view"

even if you do a ls -a?

For the test concerning OSCAR okay to create a pytest but I don't know how it work and how to test them. Btw do we have test for OSCAR?

For the test concerning OSCAR okay to create a pytest but I don't know how it work and how to test them. Btw do we have test for OSCAR?

We don't, but it is an open issue #210

How do you deal with the path of the data? It was the objective of the config file, but I think only David used it, because I didn't look at OSCAR data since then, and you probably do something else. In theory, I do think it might be a good approach.

Yes, i believe I'm the only one that uses this! I created it originally to deal with all of the machine-hopping I was doing, however I could definitely live without it.

Concerning 1/Notebooks,

I do think there is value of keeping some notebooks, a/ some for a nice start for new people, and b/ some to get inspiration from what has already been done within the project and sharing it between us. We tend to prefer to test our code with notebooks instead of "pytest", there is probably value to keep them somewhere as long as they are not integrated as a proper unit test.

As for what notebooks to keep - I can start by removing any of my old ones that are now obsolete. I do all of my processing (outside of the full wind-current retrieval) and post-processing visualisation etc with notebooks so a lot of them are just full of different 'eras' of the processing as we developed it. I agree that there is some utility in some demonstration scripts - perhaps this should pair with some demonstration L2 data as well?

I'm picking this up now, what I could do with is a list of notebooks that we want to keep on main and/or ideas for notebooks that we would want included.

I have a bunch that are specific to things like the SEASTARex paper that don't need to be pushed to main and can stay on my branch, however they contain things that are useful to others with a spot of tidying up. So far the list seems to be:

  • L1b processing
  • L2c processing
  • L2 pre-processing
  • Plotting of results
  • Star pattern analysis (maybe?)

We have a lot of notebooks that contain older versions of things - my plan would be to remove these entirely and consolidate things down into simpler, useable notebooks - with some example data if they use data (like plotting). So we would have something like:

\notebooks

  • \analysis
    • \OSCAR (?)
  • \tests
  • \processing

I think a root \notebooks folder is the best way to partition them all off. I think the only reason they're on this main page is because of legacy problems with notebooks accessing the .seastar package - something that is easily sorted out in the cleaned notebooks

for the notebooks tree, we have a gmfs folder, it might be simpler to follow the same hierarchy as the sub-package. So perhaps
notebooks/
gmfs/
oscar/
tests/ <- a bit where we put everything else

I don't know what you see in processing as a notebook. I guess it can all be within "oscar"

for the notebooks tree, we have a gmfs folder, it might be simpler to follow the same hierarchy as the sub-package. So perhaps notebooks/ gmfs/ oscar/ tests/ <- a bit where we put everything else

I don't know what you see in processing as a notebook. I guess it can all be within "oscar"

Sounds good. The processing notebooks are things like the L1A-L1B processing, which historically i did all in notebooks but @elemerle has been consolidating into scripts so perhaps we don't need to worry about them.

I'm coming back to the notebooks clean up issue after a break.

My problem is that I have a lot of notebooks i would like to keep, and am obviously happy to have those on my local drive etc. But the question is - what do we want to actually keep, especially if the code is imminently about to be packaged up?

Personally I think its a lot of work to adapt all of the processing notebooks I have to be generic enough for release, so I would rather remove them from main entirely. Same goes for my plotting / analysis notebooks. We could do with an example notebook to process an example dataset perhaps, but I feel like this is better served by a single notebook - if people think this is useful then I can make one up.

Are we also wanting some processing scripts? I.e. L1a to L1b etc?

Agreed, you can remove your notebooks from "main".
We could put an example notebook/script later. No hurry. I don't think it prevents us releasing a version and sending it to ESA. I am really in favour of using version in form of dates, like v2023.10,0 for the release of october instead of a version v1. I don't know if it is possible.

Agreed, you can remove your notebooks from "main". We could put an example notebook/script later. No hurry. I don't think it prevents us releasing a version and sending it to ESA. I am really in favour of using version in form of dates, like v2023.10,0 for the release of october instead of a version v1. I don't know if it is possible.

I'm sure its possible - I'll look into it