Is this library can be used with other Technolgy rather than Spark, such as Flink for example?
abeermohamed1 opened this issue · 2 comments
Ask questions that don't apply to the other templates (Bug report, Feature request)
Hi,
Thanks for the question.
Deequ has a dependency on Spark, so it cannot be used on an environment where Spark is not present.
If anyone is interested in something like Deequ but for Flink, we built a prototype for a comparable streaming data quality library for Apache Flink: StreamDQ. However, it is only a prototype so far, a small side project during my Ph.D., and we have no resources currently to continue building it. However, if someone wants to build something like this, StreamDQ might be a very good starting point. All the basic functionality already works there, it is just not very robust yet and more checks, etc. need to be implemented.