data-plumber
is a Rust-based tool designed to facilitate the manual synchronization of different databases and data sources. This project serves as both a learning exercise in Rust and a practical solution to compare and manage data across various environments.
- Compare Result Sets: Allows comparison of data across different sources.
- Generate SQL Inserts: Produces
.sql
files for database updates. - Output JSON Files: Generates JSON files based on the data queried from sources.
- Supervision: There is no direct writing to databases; this tool generates
.sql
files that users can execute manually to ensure data integrity. - Memory Management: Intermediate data is held in memory; the tool is not designed for huge datasets unless sufficient RAM is available.
- Flexibility: Configuration of data pipelines is managed via simple JSON configuration files, making it easy to adapt to different data tasks.
- Restartability: Some operations can be both expensive and slow, like prompting an LLM for every record in a table. The application should be restartable without wasting the work already done.
Every configured 'Process' is run in the listed order (no automatic dependency resolution yet) sequentially (for now). The input nodes will create a table in memory with the same name of the process, so you can use that table as input for subsequent processes.
My basic needs are:
- copy data between many databases, for example from dev to demo, or from Neo4j queries to mysql tables
- compare tables in different databases
- backup a table data before doing some edits
- integrate a table with requests to an LLM
JSON input/outputMYSQL input/outputCompare tables- Configuration templates
- Neo4j input/output
- Data validation
- data processing with jq-like syntax
- Merging/composing tables (inner join, left join)
- REST API input/output
- CSV input/output
- Rust
- Cargo (Rust's package manager)
Clone this repository and build the project using Cargo:
git clone https://github.com/yourusername/data-plumber.git
cd data-plumber
cargo build --release
To configure the tool, edit a config.json file, possibly in a new directory, with your pipeline configuration. Usage
Execute the tool using:
<$PATH>/data-plumber config.json
Contributions are welcome! Please feel free to fork the repository, make your changes, and submit a pull request.
This project is licensed under the MIT License - see the LICENSE.md file for details.
This project started an exercise for learning Rust. Thanks to WHP for sponsoring part of this software. Thanks to the Rust community for providing extensive documentation and resources.