ERA5 Data Extraction and Processing This repository provides a comprehensive Python script for extracting, processing, and analyzing ERA5 hourly reanalysis data. It demonstrates how to load GRIB files, filter specific meteorological variables, perform necessary unit conversions, merge datasets, visualize time series, and efficiently save the processed data into the cloud-optimized Zarr format. Features GRIB File Loading: Seamlessly load ERA5 data from GRIB files using xarray with the cfgrib engine. Variable Filtering: Select specific variables (e.g., 2-meter temperature, total precipitation) from multi-message GRIB files. Data Cleaning: Handle potential coordinate conflicts and drop unnecessary variables. Unit Conversions: Convert raw data units (e.g., Kelvin to Celsius, meters of precipitation to millimeters). Dataset Merging: Combine different meteorological variables into a single, cohesive xarray.Dataset. Time Series Visualization: Generate plots to visualize trends and patterns of extracted data over time. Zarr Export: Save processed data to the Zarr format for efficient, chunked, and compressed storage, ideal for large datasets and cloud-based analysis. Installation To get started, clone this repository and set up your Python environment. Clone the repository: git clone https://github.com/sayantanonfire/ERA5_data_extraction_and_post_processing.git cd ERA5-Data-Processing Create a virtual environment (recommended): python -m venv venv source venv/bin/activate # On Windows: `venv\Scripts\activate` Install the required libraries: pip install xarray cfgrib matplotlib folium numcodecs zarr cfgrib dependency: cfgrib requires the ECMWF ecCodes library. Installation can be complex depending on your OS. For more detailed instructions, refer to the cfgrib documentation. On some systems, conda install cfgrib might be easier if you use Anaconda/Miniconda. Usage This script is designed to be run in a Jupyter Notebook or as a Python script. Ensure you have your ERA5 GRIB file (e.g., 22e7a5177aa05425bfc8e4399e9a4fc.grib) in the same directory or provide the correct path. 1. Load the GRIB File The first step involves loading your ERA5 GRIB file using xarray and the cfgrib engine. Error handling is included to catch common issues. import xarray as xr import cfgrib file_path = "22e7a5177aa05425bfc8e4399e9a4fc.grib" try: ds = xr.open_dataset(file_path, engine='cfgrib') print("GRIB File successfully loaded ✅") except Exception as e: print(f"Failed to load GRIB file: {e}") raise # Print basic info and variables print(ds) print("Variables in dataset:", list(ds.data_vars)) 2. Isolate and Process Specific Variables GRIB files can contain multiple messages, sometimes leading to coordinate conflicts (e.g., for time). To handle this, we load specific variables (2t for 2-meter temperature and tp for total precipitation) separately and then merge them. This section also performs essential unit conversions. # Load temperature ds_t2m = xr.open_dataset( file_path, engine="cfgrib", backend_kwargs={"filter_by_keys": {"shortName": "2t"}} ) # Load precipitation ds_tp = xr.open_dataset( file_path, engine="cfgrib", backend_kwargs={"filter_by_keys": {"shortName": "tp"}} ) # Drop 'valid_time' if present to avoid merge conflicts if 'valid_time' in ds_t2m.variables: ds_t2m = ds_t2m.drop_vars('valid_time') if 'valid_time' in ds_tp.variables: ds_tp = ds_tp.drop_vars('valid_time') # Rename variables for clarity ds_t2m = ds_t2m.rename({'t2m': 'temperature_2m'}) ds_tp = ds_tp.rename({'tp': 'total_precipitation'}) # Unit conversions: Kelvin to Celsius, meters to millimeters ds_t2m['temperature_2m'] = ds_t2m['temperature_2m'] - 273.15 ds_t2m['temperature_2m'].attrs['units'] = '°C' if ds_tp['total_precipitation'].attrs.get('units', '') == 'm': ds_tp['total_precipitation'] *= 1000 ds_tp['total_precipitation'].attrs['units'] = 'mm' # Merge the processed datasets ds_combined = xr.merge([ds_t2m, ds_tp], compat='override') # Collapse the 'step' dimension for total precipitation precip_total = ds_combined['total_precipitation'].sum(dim='step', skipna=True) ds_combined['precip_collapsed'] = precip_total ds_combined['precip_collapsed'].attrs['long_name'] = "Total Precipitation (Collapsed over step)" ds_combined['precip_collapsed'].attrs['units'] = ds_combined['total_precipitation'].attrs.get('units', 'mm') print(ds_combined) 3. Visualize Data (Time Series Plotting) Generate plots to visualize the time series of temperature and precipitation for the extracted location. import matplotlib.pyplot as plt def plot_time_series(var_name): data = ds_combined[var_name].squeeze() # remove lat/lon dims for single point plt.figure(figsize=(15, 4)) plt.plot(data['time'], data, label=ds_combined[var_name].attrs.get('long_name', var_name), color='tab:blue') plt.xlabel("Time") plt.ylabel(f"{ds_combined[var_name].attrs.get('units', '')}") plt.title(f"Time Series of {var_name}") plt.grid(True) plt.legend() plt.tight_layout() plt.show() plot_time_series("temperature_2m") plot_time_series("precip_collapsed") 4. Save to Zarr Format Saving your processed data to Zarr is highly recommended for large datasets. Zarr is a cloud-optimized format that allows for efficient reading of data subsets without loading the entire dataset into memory. import zarr import os from numcodecs import Blosc zarr_path = "2_Temp_Precip_dynamic_data_2.zarr" # Select variables to save ds_zarr = ds_combined[["temperature_2m", "precip_collapsed"]] # Define chunking strategy chunks = {"time": 1000} # Define compression using Blosc compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.SHUFFLE) encoding = { var: {"compressor": compressor} for var in ds_zarr.data_vars } # Remove existing Zarr store if it exists if os.path.exists(zarr_path): import shutil shutil.rmtree(zarr_path) # Save to Zarr with specified chunking and compression ds_zarr.chunk(chunks).to_zarr( zarr_path, mode='w', consolidated=True, encoding=encoding, zarr_format=2 ) print(f"✅ Zarr dataset successfully saved to: {zarr_path}") 5. (Optional) Plot Location on a Map If your dataset contains latitude and longitude, you can easily plot the point on an interactive map using folium. import folium # Extract the single point location from your dataset lat = float(ds.latitude.values) lon = float(ds.longitude.values) print(f"Plotting marker at Latitude: {lat}, Longitude: {lon}") # Create a folium map centered at the point m = folium.Map(location=[lat, lon], zoom_start=10) # Add a marker folium.Marker( location=[lat, lon], popup=f"ERA5 Point\nLat: {lat}, Lon: {lon}", tooltip="ERA5 Time Series Location 📍", icon=folium.Icon(color="blue", icon="info-sign") ).add_to(m) # Display the map (will render inline in Jupyter) m The Importance and Utility of ERA5 Data ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather, providing a comprehensive and consistent record of atmospheric, land, and oceanic climate variables since 1940. It is a cornerstone for scientific research and practical applications due to its: High Resolution: Offers hourly data at a fine spatial resolution (e.g., 0.25 degrees), enabling detailed analysis of atmospheric and surface processes. Extensive Temporal Coverage: With data available back to 1940, it's invaluable for long-term climate trend analysis and historical event reconstruction. Comprehensive Variable Set: Includes a vast array of atmospheric, land, and oceanic parameters, supporting multi-disciplinary studies. Data Assimilation: As a reanalysis product, ERA5 combines observations with a sophisticated numerical weather prediction model using advanced data assimilation techniques, resulting in a globally complete and consistent dataset. Accessibility: Publicly available through platforms like the Copernicus Climate Change Service (C3S), making it widely accessible to the research community. Diverse Domains of Utilization: ERA5 data is a crucial resource across a multitude of fields: Climate Change Research: Essential for trend analysis of climate indicators, validation of climate models, and attribution studies to understand the causes of observed climate changes. Hydrology and Water Resources Management: Drives hydrological models for river flow simulation, drought monitoring, and reservoir management optimization. Agriculture and Food Security: Used for crop yield prediction, irrigation scheduling, and understanding the impact of weather on pest and disease outbreaks. Renewable Energy: Critical for wind and solar energy assessment, including site selection and forecasting power generation. Disaster Risk Reduction: Provides crucial inputs for flood forecasting, storm surge modeling, and wildfire risk assessment. Atmospheric Sciences and Meteorology: Supports research into atmospheric phenomena, extreme weather events, and numerical weather prediction model improvements. Oceanography: Offers atmospheric forcing for ocean circulation models and helps in understanding sea ice dynamics. Public Health: Enables heatwave impact assessment and disease vector modeling by linking weather conditions to health outcomes. By leveraging ERA5 data, researchers and practitioners can gain deep insights into Earth's climate system and develop solutions for critical environmental and societal challenges. Contact For any questions or collaborations, feel free to reach out: Name: Sayantan Mandal Email: sayantanonfire@gmail.com
sayantanonfire/ERA5_data_extraction_and_post_processing
This repository provides a comprehensive Python script for extracting, processing, and analyzing ERA5 hourly reanalysis data and how to load GRIB files, filter specific meteorological variables, perform necessary unit conversions, merge datasets, visualize time series, and efficiently save the processed data into the cloud-optimized Zarr format.
Jupyter NotebookMIT