/sql_to_ibis

Parses sql and converts it to ibis expressions

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

sql_to_ibis

CI Downloads PyPI license PyPI status PyPI version shields.io codecov Code style: black

sql_to_ibis is a Python package that translates SQL syntax into ibis expressions. This provides the capability of using only one SQL dialect to target many different backends

Installation

pip install sql_to_ibis

Usage

Registering and removing temp tables

To use an ibis table in sql_to_ibis you must register it. Note that for joins or queries that involve more than one table you must use the same ibis client when creating both ibis tables. Once the table is registered you can query it using SQL with the query function. In the example below, we create and query a pandas DataFrame

import ibis.pandas.api
import pandas
import sql_to_ibis

df = pandas.DataFrame({"column1": [1, 2, 3], "column2": ["4", "5", "6"]})
ibis_table = ibis.pandas.api.from_dataframe(
    df, name="my_table", client=ibis.pandas.api.PandasClient({})
)
sql_to_ibis.register_temp_table(ibis_table, "my_table")
sql_to_ibis.query(
    "select column1, cast(column2 as integer) + 1 as my_col2 from my_table"
).execute()

This would output a dataframe that looks like:

| column1 | my_col2 |
|---------|---------|
| 1       | 5       |
| 2       | 6       |
| 3       | 7       |

SQL Syntax

The sql syntax for sql_to_ibis is as follows (Note that all syntax is case insensitive):

Select statement:

SELECT [{ ALL | DISTINCT }]
    { [ <expression> ] | <expression> [ [ AS ] <alias> ] } [, ...]
[ FROM <from_item>  [, ...] ]
[ WHERE <bool_expression> ]
[ GROUP BY { <expression> [, ...] } ]
[ HAVING <bool_expression> ]

Example:

SELECT
    column4,
    Sum(column1)
FROM
    my_table
WHERE
    column3 = 'yes'
    AND column2 = 'no'
GROUP BY
    column4

Set operations:

<select_statement1>
{UNION [DISTINCT] | UNION ALL | INTERSECT [DISTINCT] | EXCEPT [DISTINCT] | EXCEPT ALL}
<select_statment2>

Example

SELECT
    *
FROM
    table1
UNION
SELECT
    *
FROM
    table2

Joins:

INNER, CROSS, FULL OUTER, LEFT OUTER, RIGHT OUTER, FULL, LEFT, RIGHT

Example:

SELECT
   *
FROM
   table1
   CROSS JOIN
      table2
SELECT
    *
FROM
    table1
JOIN
    table2
        ON table1.column1 = table2.column1

Order by and limit:

<set>
[ORDER BY <expression>]
[LIMIT <number>]

Example:

SELECT
   *
FROM
   table1
ORDER BY
   column1
LIMIT 5

Windowed aggregation:

<aggregate>() OVER(
        [PARTITION BY (<expresssion> [, <expression>...)] 
        [ORDER_BY (<expresssion> [, <expression>...)]
        [ ( ROWS | RANGE ) ( <preceding> | BETWEEN <preceding> AND <following> ) ]
       )

<preceding>: UNBOUNDED PRECEDING | <unsigned_integer> PRECEDING | CURRENT ROW
<following>: UNBOUNDED FOLLOWING | <unsigned_integer> FOLLOWING | CURRENT ROW

Supported expressions and functions:

+, -, *, /
CASE WHEN <condition> THEN <result> [WHEN ...] ELSE <result> END
SUM, AVG, MIN, MAX
{RANK | DENSE_RANK} OVER([PARTITION BY (<expresssion> [, <expression>...)])
CAST (<expression> AS <data_type>)

*Anything in <> is meant to be some string
*Anything in [] is optional
*Anything in {} is grouped together

Supported Data Types for cast expressions include:

  • VARCHAR, STRING
  • INT16, SMALLINT
  • INT32, INT
  • INT64, BIGINT
  • FLOAT16
  • FLOAT32
  • FLOAT, FLOAT64
  • BOOL
  • DATETIME64, TIMESTAMP
  • CATEGORY
  • OBJECT