data-testing
There are 27 repositories under data-testing topic.
sodadata/soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
re-data/re-data
re_data - fix data issues before your users & CEO would discover them 😊
InfuseAI/piperider
Code review for data in dbt
LukaszLapaj/software-testing-resource-pack
Various files useful for manual testing and test automation etc.
astronomer/airflow-provider-great-expectations
Great Expectations Airflow operator
re-data/dbt-re-data
re_data - fix data issues before your users & CEO would discover them 😊
sodadata/soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
data-catering/data-caterer
Test data management tool for any data source, batch or real-time
DataKitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling, new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, & continuous anomaly monitoring
sodadata/soda-github-action
:zap: Prevent downstream data quality issues by integrating the Soda Library into your CI/CD pipeline.
serialbandicoot/great-assertions
This library is inspired by the Great Expectations library. The library has made the various expectations found in Great Expectations available when using the inbuilt python unittest assertions.
neonexus/fixted
Simple DB Fixtures for Sails.js v1 (fake data for testing).
andrjas/data_check
data and pipeline testing with and for SQL
ericmjl/software-testing-open-source-and-data-science
Software Testing in Open Source and Data Science: A talk delivered at the Data Umbrella speaker series
pflooky/data-caterer
Data generation and validation tool for any data source
data-catering/data-caterer-example
Example API implementation for Data Caterer
pflooky/data-caterer-example
Example API implementation for Data Caterer
pflooky/data-caterer-docs
Documentation for Data Caterer
manoj9788/spark-etl-tests
A sample repository showcasing, implementation of testing for ETL pipeline developed with Apache Spark
shridhar1504/Sales-Forecasting-Datascience-Project
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
blleshi/Credit_Risk_Classification
Credit Risk Classification
ojasphansekar/Data-Management-Co-op
National Grid ( Python, SQL Server, SSIS, SSRS, Tableau, Power BI, SQL Server Import Export Wizard, Data Validations, Data Integrations, Data Conversions )
afairless/kalman_filter
Translating between two sets of notation for Kalman filters
Balajimohan18/Sales-Forecasting-Datascience-Project
Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.
JayLohokare/pySpark-data-testing-framework
Dynamic data testing engine based on pySpark
neha-nayeem/machine-learning-challenge
This project creates machine learning models capable of classifying candidate exoplanets from the raw dataset from NASA Kepler Space Telescope
siawayforward/dbt_about_it
I'm learning how to use dbt with BigQuery so I can apply that knowledge wherever we end up working. It seems like a good DWH interface tool to know for data transformation and testing, and allows me to solidify concepts of testing in data ops.