Utilities for working with NASA Planetary Data System v4 (PDS4) data files
The following dependencies must be met:
- python 3
- pandas
- pyyaml
- lxml
- PDS4 tools
First, clone this repository. If you are using conda, the dependencies can be installed in a new environment using the provided environment file:
conda env create -f environment.yml
The newly created environment can be activated with:
conda activate pds4utils
Otherwise, please make sure the dependencies are installed with your system package manager, or a tool like pip
. Use of a conda environment or virtualenv is recommended!
The package can then be installed with:
python setup.py install
The module contains a few simple functions and a class. A brief overview is given here:
read_table
- reads 2D tables from PDS4 products
- one level of group fields are supported
- returns a pandas dataframe
- group field data are returned as an array in each pandas cell
- if
table_name
is not given, the first table is returned - the DataFrame is indexed by the first time field, if any
- this can be set using the
index_col
parameter
- this can be set using the
read_tables
- reads multiple tables using
read_table
- useful for building a large dataframe from many similar data products
- set
add_filename=True
to add the product name to each row, to track which product the data came from
index_products(directory='.', pattern='*.xml')
- searches for PDS4 labels recursively in
directory
matchingpattern
- returns a pandas DataFrame with one row per product
- returned data include:
- LID + VID
- bundle, collection and product identifier
- start and stop time, if present
Database
- this class builds one or more DataFrames containing custom meta-data from a set of PDS4 products
- a YAML formatted configuration file is required to determine which attributes to read
- the Xpath to each attribute must be known
- see example.yml for more information
- if no config file is specified when instantiating the class, a default is looked for
pds_dbase.yml
in the user's home directory, or pointed to byAPPDATA
orXDG_CONFIG_HOME
- each entry in the configuration file produces one database table (one Pandas dataframe)
- to see which tables have been loaded, use
list_tables()
- to return a table, use
get_table(table)
- to save or restore a database using
save_dbase()
orload_dbase()
- to see which tables have been loaded, use
The Jupyter notebook included with this repository shows an example of pds4_utils in use. To view the notebook, click here.