PRODIGY Intro Tutorial

Introduction to using Python for working with Network Common Data Form (NetCDF) files.

Getting Python on your machine

If you are just getting acquainted with Python, I recommend the Transform 2022 tutorial: Getting Started with Python.

I recommend using miniforge to install python on your machine. This distribution is a lightweight package manager, and it is fast! But many people are also comfortable and familiar with Conda which will work fine for this too.

Using Git and GitHub

If you are looking for a tutorial, the Transform 2021 tutorial: Version Control for Scientists is a good place to look; specifically Part 1: Git and GitHub.

If you would like to create your own copy of this repository that you can make changes to, and push those changes to GitHub, you can first fork this repository. This is optional.

You will then need to clone this repository (or your fork). Open a terminal and run

git clone https://github.com/lheagy/prodigy-intro.git

(If you are using your fork, replace lheagy with your GitHub username).

Setting up the environment

You will then need to navigate to the prodigy-intro directory

cd prodigy-intro

and create the prodigy-intro environment. This will install the specific packages that we need for this tutorial. For more information on environments, see the Conda Managing Environments Docs. (If you installed miniforge, replace conda with mamba in the following lines)

conda env create -f environment.yml

To activate this environment, you then need to run

conda activate prodigy-intro

Environment files are a useful tool for communicating what packages someone needs to run your code and reproduce your results (#ReproducibleScience!). If you want to be very particular about specific package versions (useful when you will publish code alongside a paper), you can check out conda lock files

Jupyter

Jupyter is an interactive computing environment. It provides an interface where you can combine narrative text with code to capture and describe your workflow. If you are new to Jupyter or looking for a refersher, Project Pythia has some nice documentation.

You can launch Jupyter by running

jupyter lab

Xarray & NetCDF

We will use Xarray to open up our NetCDF files. See the notebooks for an example

Open source practices

In 2022, I helped give a tutorial on Best practices for sharing your research code. The slides, notebooks and a recording of the tutorial are available.

In terms of some brief points to consider...

Data: in general, you don't want to put large data files in GitHub. Instead, if they are public data, provide instructions for how to download those data. If you have data (or data-derived products) that you would like to share, you can use platforms such as Zenodo or FigShare to archive your data with a doi.

License: A repository without a license is not open-source, in fact, it is copyright of the author, meaning that no one else has permission to use it. To communicate if, and how, others can use your code, you can select a license. A good resource for helping select a license is choosealicense.com. Further, the Open Source Initiative maintains a definition of open-source and a catalog of licenses that satisfy that definition.

lheagy/prodigy-intro