AbsaOSS/pramen

Add varchar length metadata to the data returned by the built-in JDBC source

Closed this issue · 0 comments

Background

Relational databases often store text in varchar(n) data type. Spark when reading a JDBC source converts such columns to strings without maximum length. Keeping the original maximum length of fields can be helpful, especially for exposing tables via Hive if the consumer is also a relational database that requires meximum lengths for strings.

Feature

Add varchar length metadata to the data returned by the built-in JDBC source.

Proposed Solution

When a dataframe is requested, the JDBC source can connect to the same DB using java.sql and fetch varchar types.

The configuration key enable.schema.metadata is more generic than just varchar-related for the potential reuse for other metadata.

This adds a delay to the job so the feature should be configurable:

{
    name = "source1_name"
    factory.class = "za.co.absa.pramen.core.source.JdbcSource"

    #...

    enable.schema.metadata = true
}

Caveat

The standard of metadata provisioning is DB provider related and can differ between DBs. For instance, column name is sometimes metadata.getColumnName(i), but sometimes metadata.getColumnLabel(i)