data-transformation

There are 622 repositories under data-transformation topic.

  • glom

    mahmoud/glom

    ☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

    Language:Python2.1k2217768
  • hi-primus/optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

    Language:Python1.5k36218233
  • 2ndQuadrant/pglogical

    Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.

    Language:C1.1k83427159
  • zingg

    zinggAI/zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

    Language:Java1.1k16548138
  • bruin-data/bruin

    Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

    Language:Go97981043
  • mattt/TransformerKit

    A block-based API for NSValueTransformer, with a growing collection of useful examples.

    Language:Objective-C8423222101
  • raystack/optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

    Language:Go75216268153
  • SebKrantz/collapse

    Advanced and Fast Data Transformation in R

    Language:C689825134
  • microsoft/prose

    Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.

    Language:C#639583599
  • ScriptFUSION/Porter

    :lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.

    Language:PHP615193624
  • dbohdan/sqawk

    Like awk, but with SQL and table joins

    Language:Tcl315191714
  • jupyter-naas/naas

    Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

    Language:Python285218624
  • fastverse

    fastverse/fastverse

    An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R

    Language:R28061016
  • feichao93/temme

    📄 Concise selector to extract JSON from HTML.

    Language:TypeScript27371912
  • mahmoudparsian/data-algorithms-with-spark

    O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

    Language:Python22113493
  • markus-wa/cq

    Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more

    Language:Clojure18251011
  • setl

    SETL-Framework/setl

    A simple Spark-powered ETL framework that just works 🍺

    Language:Scala182119733
  • simongray/clojure-dsl-resources

    A curated list of Clojure resources for dealing with domain-specific languages.

  • mahmoudparsian/big-data-mapreduce-course

    Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

    Language:HTML160280142
  • strengejacke/sjmisc

    Data transformation and utility functions for R

    Language:R1591213324
  • jim-schwoebel/allie

    🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.

    Language:Python14743835
  • weaverbird

    ToucanToco/weaverbird

    A visual data pipeline builder with various backends

    Language:TypeScript991417317
  • data-integrations/wrangler

    Wrangler Transform: A DMD system for transforming Big Data

    Language:Java928056
  • galliaproject/gallia-core

    A schema-aware Scala library for data transformation

    Language:Scala87414
  • aws-samples/aws-dbs-refarch-datalake

    Reference Architectures for Datalakes on AWS

    Language:HTML7810031
  • dry-rb/dry-transformer

    Data transformation toolkit

    Language:Ruby77899
  • devsgnr/breadroll

    breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.

    Language:TypeScript75321
  • all.this

    neurons-me/all.this

    All.This is a modular framework for managing and standardizing data structures, enabling seamless interaction across the neurons.me ecosystem. It transforms objects like images, text, and audio into structured formats optimized for machine learning and deep learning applications.

    Language:JavaScript65300
  • daany

    bhrnjica/daany

    Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.

    Language:C#60665
  • DataWeaveInApex

    developerforce/DataWeaveInApex

    Examples for working with DataWeave scripts from Apex.

    Language:Apex607018
  • assemblee-virtuelle/Semantic-Bus

    object flow treatment, data transformation

    Language:JavaScript57181319
  • scopashq/typestream

    ⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first

    Language:TypeScript53310
  • nilportugues/php-serializer

    Serialize PHP variables, including objects, in any format. Support to unserialize it too.

    Language:PHP5141819
  • hopsoft/pipe_envy

    Elixir style pipe operator for Ruby

    Language:Ruby47401
  • cjdoris/Chevrons.jl

    Your friendly >> chevron >> based syntax for piping data through multiple transformations.

    Language:Julia36220
  • alexocode/babel

    Data transformations made easy

    Language:Elixir32202