Decoding Nonlinear Signals in Multidimensional Precipitation Observations, maintained by Fraser King
This is the public code repository for our research article currently submitted to Nature Communications.
Changes in the phase of precipitation reaching the surface has far-reaching implications to agricultural productivity, fresh water availability, outdoor recreation economies, and ecosystem sustainability. In this study, we aimed to improve precipitation data analysis for weather prediction by examining over 1.5 million minute-scale particle measurements from seven sites over ten years. Using nonlinear dimensionality reduction techniques, we reduced the data complexity by 75% and identified nine unique precipitation groups. The nonlinear technique provided clearer separation than traditional linear methods, with fewer ambiguous cases and better categorization of important hydrometeor properties like precipitation phase and intensity. These findings enhance our understanding of global precipitation patterns by revealing hidden features in large, complex datasets.
This repository contains the processing and analysis scripts used in the article, figure plotting code and an example interactive notebook for experimenting with some of the precipitation data yourself using similar techniques. The goal of this repository is to provide open access to other for reproducing our results, or adapting them for future work.
To play with the data yourself, please see our interactive tool. You can see an example of what the UMAP+HDBSCAN precipitation clusters look like in the animated image below.
A Comprehensive Northern Hemisphere Particle Microphysics Dataset from the Precipitation Imaging Package
The data for this project is hosted online on UM's DeepBlue repository.
We have collected PIP microphysical data from a variety of measurement locations across the northern hemisphere. Data originally in a proprietary ASCII format has been converted to the more universally recognized NetCDF-4 format for ease of sharing and compatibility within the academic community. The conversion process, undertaken using a combination of bash and Python, ensures broader compatibility with various data analysis tools and platforms. A quality assurance (QA) procedure has been undertaken to ensure the integrity of the data. Post QA, the data is transformed into daily NetCDF-4 files following the Climate and Forecast (CF) conventions (version 1.10) and compressed with a level 2 deflation for optimized file size. Additional details into the data curation process can be found in our journal article publication.
We have also built a custom API for interacting with the PIP data called pipdb. For information on how to use the API please see our readthedocs documentation.
To see how we previously used PCA to identify modes of snowfall variaiblity, please see our associated GitHub repository.
git clone https://github.com/frasertheking/umap.git
conda env create -f env.yml
conda activate umap
We also provide an interactive Google Colab environment to experiment with (and for reproducing our results), with a subsample of our full dataset. To view the notebook please click the following button:
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Note that, as a living project, code is not as clean as it could (should) be, and unit tests need to be produced in future iterations to maintain stability.
- Fraser King, University of Michigan, kingfr@umich.edu
- Claire Pettersen, University of Michigan
- Brenda Dolan, Colorado State University
- Julia Shates, NASA Jet Propulsion Laboratory
- Derek Posselt, NASA Jet Propulsion Laboratory
This project was primarily funded by NASA New (Early Career) Investigator Program (NIP) grant at the University of Michigan. The Natural Sciences and Engineering Research Council of Canada (NSERC) also provided funding via a PDF award.