/sql_to_ibis

A Python package that parses sql and converts it to ibis expressions

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

sql_to_ibis

https://github.com/zbrookle/sql_to_ibis/workflows/CI/badge.svg?branch=master https://pepy.tech/badge/sql-to-ibis https://codecov.io/gh/zbrookle/sql_to_ibis/branch/master/graph/badge.svg?

sql_to_ibis is a Python package that translates SQL syntax into ibis expressions. This provides the capability of using only one SQL dialect to target many different backends

Installation

pip install sql_to_ibis

Usage

Registering and removing temp tables

To use an ibis table in sql_to_ibis you must register it. Note that for joins or queries that involve more than one table you must use the same ibis client when creating both ibis tables. Once the table is registered you can query it using SQL with the query function. In the example below, we create and query a pandas DataFrame

import ibis.pandas
import pandas
import sql_to_ibis

df = pandas.DataFrame({"column1": [1, 2, 3], "column2": ["4", "5", "6"]})
ibis_table = ibis.pandas.Backend().from_dataframe(
    df, name="my_table", client=ibis.pandas.connect({})
)
sql_to_ibis.register_temp_table(ibis_table, "my_table")
sql_to_ibis.query(
    "select column1, cast(column2 as integer) + 1 as my_col2 from my_table"
).execute()

This would output a DataFrame that looks like:

column1 my_col2
1 5
2 6
3 7

SQL Syntax

The sql syntax for sql_to_ibis is as follows (Note that all syntax is case insensitive):

Select statement

SELECT [{ ALL | DISTINCT }]
    { [ <expression> ] | <expression> [ [ AS ] <alias> ] } [, ...]
[ FROM <from_item>  [, ...] ]
[ WHERE <bool_expression> ]
[ GROUP BY { <expression> [, ...] } ]
[ HAVING <bool_expression> ]

Example:

SELECT
    column4,
    Sum(column1)
FROM
    my_table
WHERE
    column3 = 'yes'
    AND column2 = 'no'
GROUP BY
    column4

Note that columns with spaces in them can be expressed using double quotes. For example:

SELECT
    "my column 1",
    column2 as "the second column"
FROM
    my_table

Set operations

<select_statement1>
{UNION [DISTINCT] | UNION ALL | INTERSECT [DISTINCT] | EXCEPT [DISTINCT] | EXCEPT ALL}
<select_statment2>

Example:

SELECT
    *
FROM
    table1
UNION
SELECT
    *
FROM
    table2

Joins

INNER, CROSS, FULL OUTER, LEFT OUTER, RIGHT OUTER, FULL, LEFT, RIGHT
SELECT
   *
FROM
   table1
   CROSS JOIN
      table2
SELECT
    *
FROM
    table1
JOIN
    table2
        ON table1.column1 = table2.column1

Order by and limit

<set>
[ORDER BY <expression>]
[LIMIT <number>]

Example:

SELECT
   *
FROM
   table1
ORDER BY
   column1
LIMIT 5

Windowed aggregation

<aggregate>() OVER(
        [PARTITION BY (<expresssion> [, <expression>...)]
        [ORDER_BY (<expresssion> [, <expression>...)]
        [ ( ROWS | RANGE ) ( <preceding> | BETWEEN <preceding> AND <following> ) ]
       )

<preceding>: UNBOUNDED PRECEDING | <unsigned_integer> PRECEDING | CURRENT ROW
<following>: UNBOUNDED FOLLOWING | <unsigned_integer> FOLLOWING | CURRENT ROW

Supported expressions and functions

+, -, *, /
CASE WHEN <condition> THEN <result> [WHEN ...] ELSE <result> END
SUM, AVG, MIN, MAX
{RANK | DENSE_RANK} OVER([PARTITION BY (<expresssion> [, <expression>...])])
CAST (<expression> AS <data_type>)
<expression> is null
<expression> is not null
COALESCE(<expresssion> [, <expression>...])
  • Anything in <> is meant to be some string
  • Anything in [] is optional
  • Anything in {} is grouped together

Supported Data Types for cast expressions include:

  • VARCHAR, STRING
  • INT16, SMALLINT
  • INT32, INT, INTEGER
  • INT64, BIGINT
  • FLOAT16
  • FLOAT32
  • FLOAT, FLOAT64
  • BOOL
  • DATETIME64, TIMESTAMP
  • CATEGORY
  • OBJECT