awslabs/deequ

Getting Error name 'isComplete' is not defined while running deequ code in Azure Databricks

dilkushpatel opened this issue · 4 comments

Ask questions that don't apply to the other templates (Bug report, Feature request)

I'm trying to implement basic checks on columns of table which is in SQL Azure DW

till reading data works fine

I can also run ConstraintSuggestionRunner

When I run VerificationSuite with single check isComplete its giving error

Error:
name 'isComplete' is not defined

Code:
import sagemaker_pyspark
import pydeequ
from pyspark.sql import SparkSession
from pydeequ.analyzers import *
from pydeequ.checks import *
from pydeequ.verification import *
from pydeequ.anomaly_detection import *

classpath = ":".join(sagemaker_pyspark.classpath_jars())

spark = (SparkSession
.builder
.config("spark.driver.extraClassPath", classpath)
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())

check = Check(spark, CheckLevel.Error, "Data QC")

checkResult = VerificationSuite(spark)
.onData(df)
.addCheck(isComplete("month_id")).run()

checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df.show()

tried google did not get anything relevant.

Same error with any other check as well.

Change

checkResult = VerificationSuite(spark)
    .onData(df)
    .addCheck(
        isComplete("month_id")
    )
    .run()

to

checkResult = VerificationSuite(spark)
    .onData(df)
    .addCheck(
        check.isComplete("month_id")
    )
    .run()

See full code example here: https://github.com/awslabs/python-deequ#constraint-verification

interesting!
I was actually trying that

still error though

Error:
Check.isComplete() missing 1 required positional argument: 'column'

Code:
checkResult = VerificationSuite(spark)
.onData(df)
.addCheck(Check.isComplete("month_id")).run()

Ignore...

changed Check to check and that worked.

Thanks.

Thanks for confirming.
Since you have the following line,
check = Check(spark, CheckLevel.Error, "Data QC")

check.isComplete is correct as opposed to Check.isComplete