data-versioning

There are 42 repositories under data-versioning topic.

  • dolt

    dolthub/dolt

    Dolt – Git for Data

    Language:Go17.2k1112.3k480
  • wandb

    wandb/wandb

    🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

    Language:Python8.4k543.1k618
  • lakeFS

    treeverse/lakeFS

    lakeFS - Data version control for your data lake | Git for data

    Language:Go4.1k403.2k331
  • quiltdata/quilt

    Quilt is a data mesh for connecting people with actionable data

    Language:Jupyter Notebook1.3k1911892
  • iusztinpaul/energy-forecasting

    🌀 𝗧𝗵𝗲 𝗙𝘂𝗹𝗹 𝗦𝘁𝗮𝗰𝗸 𝟳-𝗦𝘁𝗲𝗽𝘀 𝗠𝗟𝗢𝗽𝘀 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 | 𝗟𝗲𝗮𝗿𝗻 𝗠𝗟𝗘 & 𝗠𝗟𝗢𝗽𝘀 for free by designing, building and deploying an end-to-end ML batch system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 + 2.5 𝘩𝘰𝘶𝘳𝘴 𝘰𝘧 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 & 𝘷𝘪𝘥𝘦𝘰 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴

    Language:Python8031423182
  • awesome-open-data-centric-ai

    Renumics/awesome-open-data-centric-ai

    Curated list of open source tooling for data-centric AI on unstructured data.

  • kart

    koordinates/kart

    Distributed version-control for geospatial and tabular data

    Language:Python5052827339
  • RecallGraph/RecallGraph

    A versioning data store for time-variant graph data.

    Language:JavaScript33211225
  • BemiHQ/bemi

    Automatic data change tracking for PostgreSQL

    Language:TypeScript177203
  • leeper/data-versioning

    Collecting thoughts about data versioning

  • daefresh/awesome-data-temporality

    A curated list to help you manage temporal data across many modalities 🚀.

  • sdk

    layerai-archive/sdk

    Metadata store for Production ML

    Language:Python898177
  • ropensci/gittargets

    Data version control for reproducible analysis pipelines in R with {targets}.

    Language:R814121
  • BemiHQ/bemi-prisma

    Automatic data change tracking for Prisma

    Language:TypeScript66311
  • GitDataAI/jiaozifs

    An Git-like version control file system for data lineage & data collaboration.

    Language:Go501734
  • wrgl/wrgl

    Git-like data versioning.

    Language:Go403100
  • jomariya23156/full-stack-on-prem-cv-mlops

    "1 config, 1 command from Jupyter Notebook to serve Millions of users", Full-stack On-Premises MLOps system for Computer Vision from Data versioning to Model monitoring and drift detection.

    Language:Jupyter Notebook38102
  • aws/amazon-finspace-examples

    This repo contains sample code and sample notebooks to illustrate how to work with Amazon FinSpace

    Language:Jupyter Notebook215123
  • BemiHQ/bemi-typeorm

    Automatic data change tracking for TypeORM

    Language:TypeScript21300
  • martysai/artificial-text-detection

    Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.

    Language:Python14301
  • pier4all/mongoose-versioned

    Document versioning library for MongoDB using the mongoose package.

    Language:JavaScript14577
  • d-lowl/bunny-party

    A demonstration of how DVC and MLFlow can be used in the task of data relabeling

    Language:Python10100
  • datopian/ckanext-versions

    A CKAN extension for data versioning.

    Language:Python8786
  • datopian/ckanext-versioning

    Deprecated. See https://github.com/datopian/ckanext-versions. ⏰ CKAN extension providing data versioning (metadata and files) based on git and github.

    Language:Python77374
  • zensors/droplet

    A JSON-based format for working with machine learning data, with a focus on data interoperability.

  • data-as-code/dac

    Python Data as Code core implementation

    Language:Python6180
  • dolthub/kedro-dolt

    Kedro-Dolt Hook Plugin

    Language:Python4622
  • KalyanM45/Data-Version-Control-Demo

    The provided demo project demonstrates the practical implementation and advantages of using DVC. It showcases how DVC simplifies data versioning and model versioning while working in tandem with Git to create a cohesive version control system tailored for data science projects.

    Language:Python310
  • newron-sdk

    NewronAI/newron-sdk

    Newron is a data-centric ML platform to easily build, manage, deploy and continuously improve models through data driven development.

    Language:Python3404
  • VineetKT/ML_fastapi_on_Heroku_CI-CD

    Deploying a Machine Learning Model on Heroku with FastAPI using CI/CD tools as GitHub Actions and Heroku Automatic Deployment.

    Language:Jupyter Notebook3302
  • OElesin/modeldb-aws

    Verta ai ModelDB on AWS Cloud with integration into Amazon SageMaker for ML training data versioning and experiment tracking

    Language:TypeScript110
  • pier4all/data-versioning

    Repository for evaluating the different approaches to data versioning

    Language:JavaScript1301
  • tahonick/MLOps-Data-versioning-with-ClearML

    Learning data and model versioning with ClearML while cleaning and modeling happiness by country with a Kaggle dataset

    Language:HTML1200
  • albagc/auto-data-version

    Obtain data versioning tag using ML models

    Language:Jupyter Notebook00
  • cs-uche/Car-Prices-Prediction

    Advanced Machine Learning Regression: Predicting Car Prices

    Language:Jupyter Notebook0100
  • ksm26/LLMOps

    In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy custom Large Language Models (LLMs).

    Language:Jupyter Notebook102