/datafusion

SQL Query Execution in Rust

Primary LanguageRustApache License 2.0Apache-2.0

DataFusion: SQL Query Execution in Rust

License Version Build Status Coverage Status Gitter chat

DataFusion is a SQL parser, planner, and query execution library for Rust. A DataFrame API is also provided.

The following features are currently supported:

  • SQL Parser, Planner and Optimizer
  • DataFrame API
  • Columnar processing using Apache Arrow
  • Support for local CSV and Apache Parquet files
  • Single-threaded execution of SQL queries, supporting:
    • Projection
    • Selection
    • Scalar Functions
    • Aggregates (Min, Max, Count)
    • Grouping
  • User-defined Scalar Functions (UDFs)

DataFusion can be used as a crate dependency in your project to add SQL support for custom data sources.

A Docker image is also available if you just want to run SQL queries against your CSV and Parquet files.

I have plans to make DataFusion a fully distributed compute platform with features similar to Apache Spark, but I need help from contributors to get there.

Project Home Page

The project home page is now at https://datafusion.rs and contains the roadmap as well as documentation for using this crate. I am using GitHub issues to track development tasks and feedback.

Prerequisites

  • Rust nightly (required by parquet-rs crate)

Building DataFusion

See BUILDING.md.

Gitter

There is a Gitter channel where you can ask questions about the project or make feature suggestions too.

Contributing

Contributors are welcome! Please see CONTRIBUTING.md for details.