data-versioning
There are 57 repositories under data-versioning topic.
dolthub/dolt
Dolt – Git for Data
wandb/wandb
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data
quiltdata/quilt
Quilt is a data mesh for connecting people with actionable data
iusztinpaul/energy-forecasting
🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴
Renumics/awesome-open-data-centric-ai
Curated list of open source tooling for data-centric AI on unstructured data.
koordinates/kart
Distributed version-control for geospatial and tabular data
BemiHQ/bemi-io
Automatic data change tracking for PostgreSQL
RecallGraph/RecallGraph
A versioning data store for time-variant graph data.
daefresh/awesome-data-temporality
A curated list to help you manage temporal data across many modalities 🚀.
GitDataAI/jzfs
Git based Version Control File System for joint management of code, data, model and their relationship.
leeper/data-versioning
Collecting thoughts about data versioning
BemiHQ/bemi-prisma
Automatic data change tracking for Prisma
ropensci/gittargets
Data version control for reproducible analysis pipelines in R with {targets}.
layerai-archive/sdk
Metadata store for Production ML
jomariya23156/full-stack-on-prem-cv-mlops
"1 config, 1 command from Jupyter Notebook to serve Millions of users", Full-stack On-Premises MLOps system for Computer Vision from Data versioning to Model monitoring and drift detection.
wrgl/wrgl
Git-like data versioning.
BemiHQ/bemi-typeorm
Automatic data change tracking for TypeORM
aws/amazon-finspace-examples
This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace
martysai/artificial-text-detection
Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.
pier4all/mongoose-versioned
Document versioning library for MongoDB using the mongoose package.
ksm26/LLMOps
In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy custom Large Language Models (LLMs).
d-lowl/bunny-party
A demonstration of how DVC and MLFlow can be used in the task of data relabeling
data-as-code/dac
Python Data as Code core implementation
datopian/ckanext-versions
A CKAN extension for data versioning.
ropensci/butterfly
Verification of continually updating timeseries data where we expect new values, but want to ensure previous data remains unchanged. Maintained by @thomaszwagerman
BemiHQ/bemi-supabase-js
Automatic data change tracking for Supabase JS
datopian/ckanext-versioning
Deprecated. See https://github.com/datopian/ckanext-versions. ⏰ CKAN extension providing data versioning (metadata and files) based on git and github.
zensors/droplet
A JSON-based format for working with machine learning data, with a focus on data interoperability.
BemiHQ/bemi-django
Automatic data change tracking for Django
abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
BemiHQ/bemi-sqlalchemy
Automatic data change tracking for SQLAlchemy
BemiHQ/bemi-mikro-orm
Automatic data change tracking for MikroORM
dolthub/kedro-dolt
Kedro-Dolt Hook Plugin
NewronAI/newron-sdk
Newron is a data-centric ML platform to easily build, manage, deploy and continuously improve models through data driven development.
R01noq/Sales-Data-Analysis-and-Visualization-with-Missing-Data-Cleaning
This project demonstrates a complete workflow for analyzing sales data with missing values. It includes data cleaning, feature engineering, aggregation, and visualizations using Python libraries such as Pandas, NumPy, and Matplotlib.