/Query.jl

Query almost anything in julia

Primary LanguageJuliaOtherNOASSERTION

Query

Project Status: Active - The project has reached a stable, usable state and is being actively developed. Build Status Build status Query Query codecov

Overview

Query is a package for querying julia data sources. It can filter, project, join and group data from any iterable data source, including all the sources supported in IterableTables.jl. One can for example query any of the following data sources: any array, DataFrames, DataStreams (including CSV, Feather, SQLite, ODBC), DataTables, IndexedTables, TimeSeries, Temporal, TypedTables and DifferentialEquations (any DESolution).

The package currently provides working implementations for in-memory data sources, but will eventually be able to translate queries into e.g. SQL. There is a prototype implementation of such a "query provider" for SQLite in the package, but it is experimental at this point and only works for a very small subset of queries.

Query is heavily inspired by LINQ, in fact right now the package is largely an implementation of the LINQ part of the C# specification. Future versions of Query will most likely add features that are not found in the original LINQ design.

Alternatives

Query.jl is not the only julia initiative for querying data, there are many other packages that have similar goals. Take a look at DataFramesMeta.jl, StructuredQueries.jl, and LazyQuery.jl. If I missed other initiatives, please let me know and I'll add them to this list!

Installation

You can add the package with:

Pkg.add("Query")

Getting started

To get started, take a look at the documentation.

Getting help

Please ask any usage question in the Data Domain on the julia Discourse forum. If you find a bug or have an improvement suggestion for this package, please open an issue in this github repository.

Highlights

  • Query is an almost complete implementation of the query expression section of the C# specification, with some additional julia specific features added in.
  • The package supports a large number of data sources: DataFrames, DataStreams (including CSV, Feather, SQLite, ODBC), DataTables, IndexedTables, TimeSeries, Temporal, TypedTables, DifferentialEquations (any DESolution), arrays any type that can be iterated.
  • The results of a query can be materialized into a range of different data structures: iterators, DataFrames, DataTables, IndexedTables, TimeSeries, Temporal, TypedTables, arrays, dictionaries or any DataStream sink (this includes CSV and Feather files).
  • One can mix and match almost all sources and sinks within one query. For example, one can easily perform a join of a DataFrame with a CSV file and write the results into a Feather file, all within one query.
  • The type instability problems that one can run into with DataFrames do not affect Query, i.e. queries against DataFrames are completely type stable.
  • There are three different APIs that package authors can use to make their data sources queryable with this package. The most simple API only requires a data source to provide an iterator. Another API provides a data source with a complete graph representation of the query and the data source can e.g. rewrite that query graph as a SQL statement to execute the query. The final API allows a data source to provide its own data structures that can represent a query graph.
  • The package is completely documented.