Introduction to using Python for working with Network Common Data Form (NetCDF) files.
If you are just getting acquainted with Python, I recommend the Transform 2022 tutorial: Getting Started with Python.
I recommend using miniforge to install python on your machine. This distribution is a lightweight package manager, and it is fast! But many people are also comfortable and familiar with Conda which will work fine for this too.
If you are looking for a tutorial, the Transform 2021 tutorial: Version Control for Scientists is a good place to look; specifically Part 1: Git and GitHub.
If you would like to create your own copy of this repository that you can make changes to, and push those changes to GitHub, you can first fork this repository. This is optional.
You will then need to clone this repository (or your fork). Open a terminal and run
git clone https://github.com/lheagy/prodigy-intro.git
(If you are using your fork, replace lheagy
with your GitHub username).
You will then need to navigate to the prodigy-intro
directory
cd prodigy-intro
and create the prodigy-intro
environment. This will install the specific packages that we need for this tutorial. For more information on environments, see the Conda Managing Environments Docs. (If you installed miniforge, replace conda
with mamba
in the following lines)
conda env create -f environment.yml
To activate this environment, you then need to run
conda activate prodigy-intro
Environment files are a useful tool for communicating what packages someone needs to run your code and reproduce your results (#ReproducibleScience!
). If you want to be very particular about specific package versions (useful when you will publish code alongside a paper), you can check out conda lock files
Jupyter is an interactive computing environment. It provides an interface where you can combine narrative text with code to capture and describe your workflow. If you are new to Jupyter or looking for a refersher, Project Pythia has some nice documentation.
You can launch Jupyter by running
jupyter lab
We will use Xarray to open up our NetCDF files. See the notebooks for an example
In 2022, I helped give a tutorial on Best practices for sharing your research code. The slides, notebooks and a recording of the tutorial are available.
In terms of some brief points to consider...
Data: in general, you don't want to put large data files in GitHub. Instead, if they are public data, provide instructions for how to download those data. If you have data (or data-derived products) that you would like to share, you can use platforms such as Zenodo or FigShare to archive your data with a doi.
License: A repository without a license is not open-source, in fact, it is copyright of the author, meaning that no one else has permission to use it. To communicate if, and how, others can use your code, you can select a license. A good resource for helping select a license is choosealicense.com. Further, the Open Source Initiative maintains a definition of open-source and a catalog of licenses that satisfy that definition.