/DataQuality-with-SODA

Validate the data quality using SODA in the most effective way with minimum time

Validate Data Quality in PostgreSQL Using SODA 🥤

prerequisite 🔑

  • Create and activate Python venv
    python3 -m venv venv_name
    source /venv_name/bin/activate
  • clone the repo and navigate to the repo

installation ⚙️

pip install -r requirements.txt
  • soda-core
    pip install soda-core
  • soda-bigquery
    pip install -i https://pypi.cloud.soda.io soda-postgres

how-to-run 🚀

  • create an account in soda_cloud and create an API from profile section
  • save the API
  • configuration
    • rename the sample_configuration.yml file to configuration.yml
    • configuration.yml
      • update the postgres config
      • update the soda_cloud section with soda API
    • run the command to check config and DB connection
      soda test-connection -d my_postgres_source -c configuration.yml -V
  • checks.yml
    • update the dataset_name after checks for according to your postgres schema
    • run the following cmd (everytime you need to run the cmd when you update the checks.yml file)
      soda scan -d my_postgres_source -c configuration.yml checks.yml 
  • Go to your soda cloud profile and check the dashboard
  • DONE 🎯

doc 📚