Metis-Data

A set of utility functions and classes to aid in running jobs and libraries on a spark cluster, mostly on
the Databricks Platform, and targeted at AWS deployments.

TODO: This is a copy from the old jobworthy repo, and requires tons of updates to match the metis_data API.

Spark Job

Util Module

  • Spark Session

Repo Module

Repository Module

Schema Module

The Schema module provides functions for building a more abstract definition of a Hive table schema and abstractions for creating table, column and cell data which can be provided as the data argument when creating a dataframe.