/pds4_utils

Utilities for working with NASA Planetary Data System v4 (PDS4) data files

Primary LanguagePythonApache License 2.0Apache-2.0

pds4_utils

Utilities for working with NASA Planetary Data System v4 (PDS4) data files

Dependencies

The following dependencies must be met:

  • python 3
  • pandas
  • pyyaml
  • lxml
  • PDS4 tools

Installation

First, clone this repository. If you are using conda, the dependencies can be installed in a new environment using the provided environment file:

conda env create -f environment.yml

The newly created environment can be activated with:

conda activate pds4utils

Otherwise, please make sure the dependencies are installed with your system package manager, or a tool like pip. Use of a conda environment or virtualenv is recommended!

The package can then be installed with:

python setup.py install

Contents

The module contains a few simple functions and a class. A brief overview is given here:

read_table

  • reads 2D tables from PDS4 products
  • one level of group fields are supported
  • returns a pandas dataframe
    • group field data are returned as an array in each pandas cell
    • if table_name is not given, the first table is returned
    • the DataFrame is indexed by the first time field, if any
      • this can be set using the index_col parameter

read_tables

  • reads multiple tables using read_table
  • useful for building a large dataframe from many similar data products
  • set add_filename=True to add the product name to each row, to track which product the data came from

index_products(directory='.', pattern='*.xml')

  • searches for PDS4 labels recursively in directory matching pattern
  • returns a pandas DataFrame with one row per product
  • returned data include:
    • LID + VID
    • bundle, collection and product identifier
    • start and stop time, if present

Database

  • this class builds one or more DataFrames containing custom meta-data from a set of PDS4 products
  • a YAML formatted configuration file is required to determine which attributes to read
    • the Xpath to each attribute must be known
    • see example.yml for more information
    • if no config file is specified when instantiating the class, a default is looked for
      • pds_dbase.yml in the user's home directory, or pointed to by APPDATA or XDG_CONFIG_HOME
  • each entry in the configuration file produces one database table (one Pandas dataframe)
    • to see which tables have been loaded, use list_tables()
    • to return a table, use get_table(table)
    • to save or restore a database using save_dbase() or load_dbase()

Example

The Jupyter notebook included with this repository shows an example of pds4_utils in use. To view the notebook, click here.