data-transformation
There are 425 repositories under data-transformation topic.
mahmoud/glom
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
2ndQuadrant/pglogical
Logical Replication extension for PostgreSQL 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
mattt/TransformerKit
A block-based API for NSValueTransformer, with a growing collection of useful examples.
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
microsoft/prose
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
ScriptFUSION/Porter
:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
SebKrantz/collapse
Advanced and Fast Data Transformation in R
dbohdan/sqawk
Like awk but with SQL and table joins
jupyter-naas/naas
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
feichao93/temme
📄 Concise selector to extract JSON from HTML.
fastverse/fastverse
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
mahmoudparsian/data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
SETL-Framework/setl
A simple Spark-powered ETL framework that just works 🍺
simongray/clojure-dsl-resources
A curated list of Clojure resources for dealing with domain-specific languages.
markus-wa/cq
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
strengejacke/sjmisc
Data transformation and utility functions for R
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
jim-schwoebel/allie
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
ToucanToco/weaverbird
A visual data pipeline builder with various backends
data-integrations/wrangler
Wrangler Transform: A DMD system for transforming Big Data
galliaproject/gallia-core
A schema-aware Scala library for data transformation
aws-samples/aws-dbs-refarch-datalake
Reference Architectures for Datalakes on AWS
dry-rb/dry-transformer
Data transformation toolkit
devsgnr/breadroll
breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.
developerforce/DataWeaveInApex
Examples for working with DataWeave scripts from Apex.
bhrnjica/daany
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
assemblee-virtuelle/Semantic-Bus
object flow treatment, data transformation
scopashq/typestream
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
bruin-data/bruin
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
nilportugues/php-serializer
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
hopsoft/pipe_envy
Elixir style pipe operator for Ruby
bloomberg/pycsvw
A tool to read CSV files with CSVW metadata and transform them into other formats.
fiddlerwoaroof/data-lens
Functional utilities for Common Lisp
tsantos84/serializer
A PHP serialization component focused on performance