data-generation

There are 213 repositories under data-generation topic.

  • IDEA-Research/Grounded-Segment-Anything

    Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

    Language:Jupyter Notebook13.8k1143671.3k
  • benkeen/generatedata

    A powerful, feature-rich, random test data generator.

    Language:TypeScript2.2k121690613
  • SDV

    sdv-dev/SDV

    Synthetic data generation for tabular data

    Language:Python2.2k411.2k288
  • AgaMiko/data-augmentation-review

    List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.

  • neomatrix369/awesome-ai-ml-dl

    Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.

    Language:Jupyter Notebook1.4k8312344
  • shuttle-hq/synth

    The Declarative Data Generator

    Language:Rust1.3k26160101
  • CTGAN

    sdv-dev/CTGAN

    Conditional GAN for generating synthetic tabular data.

    Language:Python1.2k21205275
  • whatyouhide/stream_data

    Data generation and property-based testing for Elixir. 🔮

    Language:Elixir848229966
  • Westlake-AI/openmixup

    CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

    Language:Python581165359
  • nomemory/mockneat

    MockNeat - the modern faker lib.

    Language:Java525255947
  • tom-lord/regexp-examples

    Generate strings that match a given regular expression

    Language:Ruby521151731
  • Copulas

    sdv-dev/Copulas

    A library to model multivariate data using copulas.

    Language:Python51622186104
  • MTG/DeepConvSep

    Deep Convolutional Neural Networks for Musical Source Separation

    Language:Python4653420109
  • tirthajyoti/pydbgen

    Random dataframe and database table generator

    Language:Python297111259
  • microsoft/genalog

    Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

    Language:Jupyter Notebook296121629
  • databrickslabs/dbldatagen

    Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

    Language:Python272137053
  • trinker/wakefield

    Generate random data sets

    Language:R253162527
  • kathrinse/be_great

    A novel approach for synthesizing tabular data using pretrained large language models

    Language:Python24174139
  • cieslarmichal/faker-cxx

    C++ Faker library for generating fake (but realistic) data.

    Language:C++213726188
  • worldbank/REaLTabFormer

    A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

    Language:Jupyter Notebook18846122
  • datahelix

    finos/datahelix

    The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation

    Language:Java1403174950
  • rapiddweller/rapiddweller-benerator-ce

    BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

    Language:Java128105524
  • gretelai/awesome-synthetic-data

    📖 A curated list of resources dedicated to synthetic data

  • louisYen/Gen4Gen

    🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"

    Language:Python91935
  • DeepEcho

    sdv-dev/DeepEcho

    Synthetic Data Generation for mixed-type, multivariate time series.

    Language:Python89113713
  • ImageDataAugmentor

    mjkvaak/ImageDataAugmentor

    Custom image data generator for TF Keras that supports the modern augmentation module albumentations

    Language:Python8631527
  • tinybirdco/mockingbird

    Mockingbird is a mock streaming data generator

    Language:TypeScript837288
  • ykang/gratis

    GRATIS: GeneRAting TIme Series with diverse and controllable characteristics

    Language:R76131629
  • kgoldfeld/simstudy

    simstudy: Illuminating research methods through data generation

    Language:R7451248
  • smartcat-labs/ranger

    Ranger is contextual data generator used to make sensible data for integration tests or to play with it in the database

    Language:Java59249811
  • tosiron/jazznet

    jazznet dataset of piano patterns for music audio machine learning research

    Language:Python59301
  • leezythu/FlexKBQA

    FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

    Language:Python56343
  • dmey/synthia

    📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python

    Language:Python523108
  • edyan/neuralyzer

    Neuralyzer is a library and a command line tool to anonymize databases (by updating existing data or populating a table with fake data)

    Language:PHP509913
  • Cambalab/fake-data-generator

    Just a small open-source script to create fake data given a simple JSON model.

    Language:JavaScript4952814
  • microsoft/CodeMixed-Text-Generator

    This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.

    Language:Jupyter Notebook488512