Getting Error name 'isComplete' is not defined while running deequ code in Azure Databricks

Question

Getting Error name 'isComplete' is not defined while running deequ code in Azure Databricks

dilkushpatel opened this issue a year ago · 4 comments

dilkushpatel commented a year ago

Ask questions that don't apply to the other templates (Bug report, Feature request)

I'm trying to implement basic checks on columns of table which is in SQL Azure DW

till reading data works fine

I can also run ConstraintSuggestionRunner

When I run VerificationSuite with single check isComplete its giving error

Error:
name 'isComplete' is not defined

Code:
import sagemaker_pyspark
import pydeequ
from pyspark.sql import SparkSession
from pydeequ.analyzers import *
from pydeequ.checks import *
from pydeequ.verification import *
from pydeequ.anomaly_detection import *

classpath = ":".join(sagemaker_pyspark.classpath_jars())

spark = (SparkSession
.builder
.config("spark.driver.extraClassPath", classpath)
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())

check = Check(spark, CheckLevel.Error, "Data QC")

checkResult = VerificationSuite(spark)
.onData(df)
.addCheck(isComplete("month_id")).run()

checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df.show()

tried google did not get anything relevant.

Same error with any other check as well.

Answer 1 · 2023-08-24T16:32:12.000Z

Change

checkResult = VerificationSuite(spark)
    .onData(df)
    .addCheck(
        isComplete("month_id")
    )
    .run()

to

checkResult = VerificationSuite(spark)
    .onData(df)
    .addCheck(
        check.isComplete("month_id")
    )
    .run()

See full code example here: https://github.com/awslabs/python-deequ#constraint-verification

Answer 2 · 2023-08-24T17:11:02.000Z

interesting!
I was actually trying that

still error though

Error:
Check.isComplete() missing 1 required positional argument: 'column'

Code:
checkResult = VerificationSuite(spark)
.onData(df)
.addCheck(Check.isComplete("month_id")).run()

Answer 3 · 2023-08-24T17:11:57.000Z

Ignore...

changed Check to check and that worked.

Thanks.

Answer 4 · 2023-08-24T18:03:35.000Z

Thanks for confirming.
Since you have the following line,
check = Check(spark, CheckLevel.Error, "Data QC")

check.isComplete is correct as opposed to Check.isComplete