synthetic-dataset-generation

There are 199 repositories under synthetic-dataset-generation topic.

  • Eladlev/AutoPrompt

    A framework for prompt tuning using Intent-based Prompt Calibration

    Language:Python1.7k1018143
  • distilabel

    argilla-io/distilabel

    ⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

    Language:Python1k1227165
  • Unity-Technologies/com.unity.perception

    Perception toolkit for sim2real training and validation in Unity

    Language:C#88137325172
  • DataDreamer

    datadreamer-dev/DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

    Language:Python69682037
  • nicolas-hbt/pygraft

    Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips

    Language:Python64613144
  • paulbricman/thisrepositorydoesnotexist

    A curated list of awesome projects which use Machine Learning to generate synthetic content.

  • NVIDIA/Dataset_Synthesizer

    NVIDIA Deep learning Dataset Synthesizer (NDDS)

    Language:C++55938116125
  • BatsResearch/bonito

    A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.

    Language:Python539131435
  • Unity-Technologies/SynthDet

    SynthDet - An end-to-end object detection pipeline using synthetic data

    Language:C#351181354
  • augraphy

    sparkfish/augraphy

    Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

    Language:Python3091113140
  • tirthajyoti/pydbgen

    Random dataframe and database table generator

    Language:Python297111259
  • PeopleSansPeople

    Unity-Technologies/PeopleSansPeople

    Unity's privacy-preserving human-centric synthetic data generator

    Language:C#296261733
  • fjxmlzn/DoppelGANger

    [IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

    Language:Python27974372
  • firmai/datagene

    DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

    Language:Jupyter Notebook1925122
  • worldbank/REaLTabFormer

    A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

    Language:Jupyter Notebook18846122
  • DeFMO

    rozumden/DeFMO

    [CVPR 2021] DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

    Language:Python1694724
  • davanstrien/awesome-synthetic-datasets

    awesome synthetic (text) datasets

    Language:Jupyter Notebook1478
  • NVIDIA/Dataset_Utilities

    NVIDIA Dataset Utilities (NVDU)

    Language:Python1268022
  • ViLab-UCSD/OpenRooms

    This is the dataset and code release of the OpenRooms Dataset. For more information, please refer to our webpage below. Thanks a lot for your interest in our research!

  • PerceivingSystems/bedlam_render

    BEDLAM (CVPR 2023) render pipeline tools

    Language:Python1205246
  • SqueezeAILab/LLM2LLM

    [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

    Language:Python115668
  • isarandi/synthetic-occlusion

    Synthetic Occlusion Augmentation

    Language:Python1148219
  • firmai/mtss-gan

    MTSS-GAN: Multivariate Time Series Simulation with Generative Adversarial Networks (by @firmai)

  • jtheiner/LegoBrickClassification

    Repository to identify Lego bricks automatically only using images

    Language:Python916522
  • remyxai/VQASynth

    Compose multimodal datasets 🎹

    Language:Python904
  • VinAIResearch/Dataset-Diffusion

    Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation (NeurIPS2023)

    Language:Jupyter Notebook79253
  • netsharecmu/NetShare

    (SIGCOMM '22) Practical GAN-based Synthetic IP Header Trace Generation using NetShare

    Language:Python7181921
  • privateai/deid-examples

    Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.

    Language:Jupyter Notebook67311
  • discus-labs/discus

    A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

    Language:Python601186
  • nbsynthetic

    NextBrain-ai/nbsynthetic

    nbsynthetic is simple and robust tabular synthetic data generation library for small and medium size datasets

    Language:Jupyter Notebook570410
  • 921kiyo/3d-dl

    Synthetic Dataset Generation for Object-to-model Deep Learning

    Language:Python537320
  • astorfi/cor-gan

    :unlock: COR-GAN: Correlation-Capturing Convolutional Neural Networks for Generating Synthetic Healthcare Records

    Language:Python506112
  • Abdullah-Abuolaim/recurrent-defocus-deblurring-synth-dual-pixel

    Reference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Leveraging these realistic synthetic DP images, we introduce a new recurrent convolutional network (RCN) architecture that can improve defocus deblurring results and is suitable for use with single-frame and multi-frame data captured by DP sensors.

    Language:Python45579
  • wormpose

    iteal/wormpose

    WormPose: Image synthesis and convolutional networks for pose estimation in C. elegans

    Language:Python456715
  • replicAnt

    evo-biomech/replicAnt

    replicAnt - generating annotated images of animals in complex environments with Unreal Engine

    Language:Python43235
  • Abdullah-Abuolaim/multi-task-defocus-deblurring-dual-pixel-nimat

    Reference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single blurry input image improves the network’s ability to learn to deblur the image. Our experiments show this multi-task strategy achieves +1dB PSNR improvement over state-of-the-art defocus deblurring methods. In addition, our multi-task framework allows accurate DP-view synthesis (e.g., ~ 39dB PSNR) from the single input image. These high-quality DP views can be used for other DP-based applications, such as reflection removal. As part of this effort, we have captured a new dataset of 7,059 high-quality images to support our training for the DP-view synthesis task.

    Language:Python42652