awslabs/deequ

[FEATURE] Add spark table metric repository

charlieyou opened this issue · 4 comments

Is your feature request related to a problem? Please describe.
Given that the rest of Deequ relies on Spark, it seems incongruous that there is no support for loading metrics from a Spark table. Saving to a JSON works fine for now, but as we scale up, we would like to take advantage of the data catalog/governance that comes along with using Spark tables (specifically with Databricks in our case, but can imagine it being generally useful outside of that).

Describe the solution you'd like
An implementation of MetricsRepository using Spark tables as the data source.

Describe alternatives you've considered
This can be hacked together by dumping a spark table to a JSON file and then reading that with the FS MR, but it's quite inelegant.

Additional context
Happy to take a crack at the implementation myself when I have more capacity in a few days.

I can take a stab at this, just want to check if PR's are accepted for this feature? @rdsharma26

@VenkataKarthikP we do take Pull Requests, feel free to work on this and thank you in advance

@mentekid @charlieyou i worked on #518 to implement this request, please take a look. thanks in advance.

@mentekid @rdsharma26 can we close this as we have PR merged. Also, can we get a release tag with latest changes. Thanks.